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Space  Shuttle  Catastroohic  Failure  Freanencv 

Executive  Summary: 

This  report  summarizes  the  Phase  1 analysis  activity  of  the  Space  Shuttle  Probabilistic  Risk  Assessment 
and  is  submitted  in  partial  fulfillment  of  the  requirements  of  contract  NAS#25809  Task  Order  25.  The 
purpose  of  this  analysis  is  to  update  the  summary  results  of  the  1989  Independent  Assessment  nf 
Shuttle  Accident  Scenario  Probabilities  for  the  Galileo  Mission  (the  Galileo  study)  (1]  to  reflect  the 
current  (April  1993)  test  and  operational  experience  base  of  the  Shuttle.  It  is  expected  that  this  analysis 
will  be  the  first  in  a series  of  periodic  or  event  driven  updates,  to  provide  a continuously  updated 
benchmark  for  the  catastrophic  failure  frequency  of  the  Shuttle. 

The  results  of  this  study  are  the  probability  distributions  of  failure  frequency  for  the  Space  Shuttle, 
summmarized  in  Table  1 below. 

Table  1:  Risk  of  Catastrophic  Failure  for  the  Space  Shuttle,  post  STS  56  (April  93) 


| PRA  Phase  1 Study  results  - Based  on  484,932  seconds  SSME  test.  55  flights  - 0 SRB  failures  assumed.  | 

Sth% 

20th% 

50  th  % 

Mean 

80th  % 

95th5h 

93  RSRB  Pair  (Base) 

1 

1 

1 

1 

1 

1 

(51-L  failure  not  included) 

782 

388 

187 

128 

90 

45 

93  SSME  Cluster 

1 

1 

1 

1 

1 

1 

1550 

741 

342 

213 

153 

71 

93  ET 

1 

1 

1 

1 

1 

1 

86400 

31900 

11200 

5200 

3950 

1460 

93  Grbiter 

I 

1 

1 

1 

1 

1 

10100 

5710 

3140 

2440 

1720 

974 

93  Prclauuch 

I 

1 

1 

1 

1 

1 

4650 

2850 

1710 

1430 

1030 

631 

93  STS  (Base) 

1 

1 

1 

1 

1 

1 

(51-L  failure  not  included) 

223 

146 

90 

73 

54 

31 

| RSRB  Sensitivity  1 - includes  the  51L  failure  to  update  the  Galileo  study  surrogate  prior,  | 

93  RSRB  (Seositivityl) 

1 

1 

1 

1 

1 

1 

(includes  51-L  failure) 

216 

128 

74 

60 

43 

25 

93  STS  (SensitJvttyl) 

1 

1 

1 

1 

1 

1 

(includes  51-L  failure) 

118 

79 

52 

44 

33 

21 

Page  1 


Shuttle  PRA  1 • Galileo  Study  Update 

Reviiioa  \ 


This  analysis  differs  from 
analyses  of  Space  Shuttle 
Rocketdyne’s  internal  SSI' 
assessment,  not  a reiiabih 
Shuttle  as  a whole,  not  thi 
catastrophic  failure  of  the 
test  stand  experience  of  fi 
performance,  it  does  not  c 
performance,  and  does  no 
probability  of  SSME  fail1: 

The  principal  conclusion.; 
reliable  launch  vehicles  to 
launch  vehicle  today.  (2) 
positive  impact  on  the  re): 
reliability  and  crew  safer 
significant  contributor  to 
major  elements  consider? 
and  will  probably  contiru 
demonstrated  only  throu|; 


vther  major  analyses  of  Span 
fein  Engines  (SSMEs)  [2]  t; 
\i  reliability  studies  [3J  in  so 
study  --  the  difference  is  e 


Shuttle  reliability,  notably  the  on-going 
■ F.  Safte  of  Marshall  Space  Flight  Center  and 
■eral  important  respects.  First,  this  is  a risk 
plorcrt  below.  Second,  the  focus  is  on  the 

ME  Third,  this  study  is  limited  to  assessing  the  probability  of 

f crew).  Finally,  while  this  study  draws  on  the 
'non-flight)  indicator  of  SSME  in-flight 
o be  a perfect  indicator  of  in-flight 
:pht  and  test  experience  to  determine  the 


; hurtle  (loss  of  vehicle,  loss 
SSME  as  the  best  available 
isider  test  stand  experience  • 
herefore  directly  combine  f! 


r f this  study  are:  (1)  The  Spa 
day,  and  under  reasonable  as' 
" i-  Space  Shuttle  Main  Engir 
• jility  of  the  Shuttle  and  has 
i Vi  The  Redesigned  Solid  12 
hi  estimated  residual  risk  of  f 
i in  this  study  (RSRB,  SSMF 
: r~i  dominate  the  estimated  r 
: flight  successes. 


Shu  ttle  is  demonstrably  one  of  the  most 
anphons  may  be  considered  the  most  reliable 
« (SSME)  test  program  has  had  a significant 
•attributed  greatly  to  demonstrating  flight 
odret  Booster  (RSRB)  is  currently  the  most 
amtrophic  failure  of  the  Shuttle  among  the 
External  Tank  (ET),  Orbiter,  and  Prelaunch), 
sic  in  the  future  since  its  reliability  is 


Introduction: 

In  April  1989  the  Safety  H 
Maintainability,  and  Qua  I 
Accident  Scenario  PtQba:  i 
carried  a radio-isotope  th : 
(OSTP)  required  that  an 
radioactive  materials  be  y ■; 
that  requirement  Onev/i 
failure  frequency  distribi 

As  rtart  of  the  Probabilist 
and  Mission  Assurance  (•! ' 
International  Corporation 
test  and  operational  expet 
series  of  periodic  or  ever 
catastrophic  failure  freqiK 

This  study  is  a risk  asses ; 
decisions  under  risk.  In  ; 
to  the  payload  associated 
and  of  itself,  make  the  Si: 
of  risk  associated  with  tV 
and  payload  while  the  Sh 
making  decisions  regard 


i vision  of  the  Office  of  the  Associate  Administrator  for  Safety,  Reliability, 
y Assurance  (Code  QS)  published  Ibe  Independent  Assessment  of  Shuttle 
^forthsLfMteaMssiQf  (the  Galileo  smdy)  [1],  The  Galileo  spacecraft 
it  ionic  generator  (RTG),  and  the  Office  of  Science  and  Technology  Policy 
;«ssraent  of  public  risk  arisii  g from  U.S.  space  launches  involving 
farmed  prior  to  launch.  The.  Galileo  study  was  performed  in  response  to 
-»(  y distributed  result  from  t*1*  Galileo  study  was  the  set  of  catastrophic 
i ms  for  the  Space  Transport"  don  System  (Shuttle). 

Risk  Assessment  of  the  Sp?  " 

SMA),  and  the  Office  of  Spr 
•■t  h,.iC)  to  update  the  Galileo 
"lice  base  of  the  Shuttle.  It  h 
(riven  updates,  to  provide  z 
cry  of  the  Shuttle. 

rent,  meaning  primarily  that 
• r.ciular  this  assessment  is  di 
iit  the  use  of  the  Space  Sh’: 

: any  less  " risky1 M or  cont' 

Shuttle.  It  does  however  do 
; tie  is  acting  as  a Launch  veh 
ii'»  (he  future  role  of  the  Shut' 


-re  Shuttle  (Shuttle  PRA)  [5],  Office  of  Safety 
•re  Higfcit  directed  Science  Applications 
study  results  to  reflect  the  current  (April  1993) 
expected  that  this  analysis  will  be  the  first  in  a 
continuously  updated  benchmark  for  the 

it:>;  purpose  is  to  facilitate  the  making  of 
remed  at  understanding  the  risk  to  the  crew  and 
da  as  a launch  vehicle.  This  study  does  not,  in 
'bate  directly  to  demonstrating  the  current  level 
tne,  describe,  and  quantify  the  risk  to  the  Shuttle 
■sis.  This  information  is  potentially  useful  in 
' s:  ‘'he  relative  effectiveness  of  design. 
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engineering,  or  operational  changes;  or  to  determine  what  future  changes  might  be  required,  and 
whether  the  current  level  of  risk  requires  any  changes  in  the  risk  management  strategy. 

Decisions  are  characterized  by  uncertainty.  Uncertainty  is  generally  associated  with  the  subjective 
elements  of  a problem:  uncertainty  in  the  ability  to  accurately  model  a problem,  or  uncertainty  in  the 
applicability  of  various  data  to  a problem,  for  example.  A basic  tenet  ofrisk  assessment  is  that 
uncertainty  in  data  can  be  quantified  and  treated  mathematically  using  the  "logic  of  uncertainty."  The 
quantification  of  an  uncertain  decision  element  (datum)  is  accomplished  by  expressing  the  datum  as  a 
probability  distribution.  Distributions  are  also  used  to  model  variability  - the  physically  measurable 
differences  between  elements  in  a population  of  items  or  events.  A "pure"  or  classical  reliability  study 
deals  only  in  variability.  Subjective  uncertainty  is  dealt  with  in  a reliability  study  by  establishing 
ground  rules  or  making  assumptions  which  remove  uncertainty  from  the  modeled  problem  — the 
uncertainty  is  still  there,  but  it  is  removed  from  the  scope  of  the  analysis.  Another  approach  which  can 
be  employed  by  the  reliability  analyst  to  account  for  uncertainty  is  the  use  of  conservatism.  Adopting  a 
more  "conservative"  approach  reflects  that  a more  negative  outcome  is  postulated  than  that  which  might 
be  verified  by  additional  data  or  modeling.  In  either  case,  it  is  incumbent  upon  the  decision  maker  to 
understand  and  assess  the  uncertainty  implicit  in  the  ground  rules  and  assumptions  of  a classical 
reliability  analysis  before  applying  the  results  of  that  analysis  in  decision  rocking. 

.Probability  theory  [8]  permits  the  combination  of  the  uncertainty  and  variability  distributions  associated 
with  a given  parameter.  Variability  in  classical  reliability  analysis  is  generally  expressed  using 
confidence  intervals  — a measure  of  the  likelihood  (confidence)  that  the  specified  interval  will  contain 
the  actual  mean  value  of  a quantity  subject  to  variability . Another  aspect  of  uncertainty  is  tolerance  — a 
measure  of  the  applicability  of  the  parameter  to  the  specific  problem  (e.g.,  hardware  configuration, 
application)  at  hand.  P»«k  analysis  uses  analogous  "uncertainty  intervals"  to  express  the  distribution, 
which  combines  both  the  tolerance  of  the  estimate  and  the  variability  in  the  quantity.  The  expression 
"confidence  interval"  is  reserved  to  apply  only  to  variability. 

This  report  contains  a brief  overview  of  the  objective  of  this  study  and  the  analysis  methods  employed, 
followed  by  a summary  of  the  results  of  die  analysis.  The  analysis  process  is  then  defined  in  depth, 
including  sufficient  discussion  of  the  data,  assumptions,  statistical  methods,  and  tools  used  to  allow 
audit  or  replication  and  extension  of  this  analysis  as  new  data  are  generated. 

Objective: 

The  objective  of  this  analysis  is  to  produce  an  up-to-date  set  of  probability  distributions  for  the  launch 
and  ascent  phase  catastrophic  failure  frequency  of  the  Shuttle.  These  distributions  will  be  generated  by 
updating  the  original  data  used  in  the  1988  Galileo  study,  and  will  preserve  the  assumptions  used  in  the 
Galileo  study. 

This  risk  assessment  is  intended  to  provide  the  decision  makr.r  with  realistic  estimates  for  the  current 
probability  of  catastrophic  failure  (failure  involving  loss  of  vehicle  and  loss  of  crew)  of  the  Space 
Shuttle.  The  application  of  the  logic  of  uncertainty  [8]  and  in  particular  Bayes'  Theorem  in  this  study 
permits  the  incorporation  of  relevant  engineering  information  which  could  not  be  included  in  a classical 
reliability  study.  The  inclusion  of  this  information  produces  results  that,  in  our  opinion,  are  a more 
accurate  reflection  of  the  engineering  realities  of  the  Space  Shuttle  than  a classical  reliability  study, 
which  must  rely  on  relatively  sparse  data.  Finally,  by  explicitly  quantifying  uncertainty  in  critical 
assumptions  and  ground  rules,  it  provides  support  for  defensible  judgments  and  decisions  under 
uncertainty.  (Even  if  the  decision  maker  disagrees  with  a particular  assumption  or  ground  rule,  the 
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quantification  of  uncertain;  ' provides  some  basis  for  understanding  the  impact  of  the  assumption  on  the 
quantitative  results  of  the  • . -ly.) 


Scope: 

This  studv  is  intended  to  : :-v-,ide  high-le": -1  insight  into  the  current  general  catastrophic  failure 
probability  of  the  Shuttle  or  this  reason,  the  focus  of  his  study  is  not  on  particular  failure  scenarios  or 
mission  phases,  but  on  th.  ; i iicr  functional  elements  o'  tits  Shuttle.  Moreover,  this  study  is  meant  to 
update  results  from  the  ea  -!:ier  Galileo  study,  not  to  be  ? t independent  assessment  of  the  risk.  The 
underlying  assumptions  o:  the  Galileo  study,  and  the  da  a used  in  that  study,  are  not  re-examined  here. 

It  should  be  noted  that  th< : ’klttea  study  has  several  tint  fa  dons  which  are  not  addressed  in  this  analysis: 
because  it  was  intended  to  erve  only  as  an  input  to  the  u;:;esaaent  of  the  nuclear  safety  of  the  Galileo 
mission,  it  dealt  exclusive  ■ with  catastrophic  failures  during  pre-launch,  launch,  and  ascent  phases  of 
flight.  It  considered  neitl:  mission  abort  situations  nc  the  on-orbit,  reentry,  and  landing  phases  of  the 

flight. 

Overview  of  the  Analy  ” 

Process  Overview: 

“7  have  but  one  la-  ty  by  which  my  feet  are  guidl  and  that  is  the  lamp  of  experience ; 

/ know  of  no  way  <: J lodging  the  future  but  by  tk  s past " 

Patrick  Henry 

Speech  in  the  Viri;;:  ia  Convention;  March  23, 1775 

A broad  introduction  to  th  - im  and  processes  used  to  obtain  the  failure  frequency  distributions  is 
provided  in  this  section.  1:  b information  is  presented  ■ greater  detail  later  in  this  report,  and  in 
reference  1,  the  Galileo  st  > ;1y  final  report.  The  approach  used  to  determine  the  overall  risk  of 
catastrophic  failure  of  the  Shuttle  in  the  1988  Galileo  sndy  was  to  analyze  the  system  in  terras  of  its 
principal  risk  contributor;  determine  the  distribution  of  failure  frequencies  associated  with  each  of  the 
risk  contributors,  and  com  hue  those  distributions  to  determine  the  overall  catastrophic  failure  frequency 
distribution  associated  wi!h  the  Shuttle. 

This  assessment  is  based i;  historical  data.  If  it  were  based  solely  on  historical  data  without  any  other 
considerations  it  would  it?. i Mte  how  the  Shuttle  has  performed,  in  the  past,  but,  since  the  Shuttle  has 
experienced  numerous  de ; i : ;n  and  operational  changes  since  its  first  flight,  it  would  not  indicate  how  the 
Shuttle  is  expected  to  per  tmi  today  and  in  the  future.  Moreover,  the  amount  of  historical  catastrophic 
failure  information  directl  y pertinent  to  the  Shuttle  is  (thankfully)  sparse.  To  make  a realistic  estimate 
of  the  current  catastrophe  failure  probability,  tMs  analysis  roust  therefore  deal  with  two  limitations  in 
the  available  data  It  rausi  .somehow  modify  or  filter  the  dta  to  reflect  the  operational  and  design 
changes  in  the  system  (in:  porate  reliability  growth):  nd  it.  must  supplement  the  sparse  data  with 
relevant  information  from  Mber  sources.  In  general,  reli  ability  growth  can  be  accommodated  in  one  of 
two  ways.  The  approach  here  is  to  segregate  (filto  ) the  underlying  failure  data  into  sets  containing 
those  failures  which  woult  - occur  on  the  current  Shuttle  and  those  which  would  not.  An  alternative 
approach  is  to  modify  the  w, lysis  model  to  reflect  growth  I'e.g.:  to  weight  the  failure  occurences  based 
upon  their  currency). 

The  principal  risk  contrib . tors  (risk  elements)  in  the  Shuttle  zse.  the  Solid  Rocket  Booster  (SRB)  pair, 
the  Space  Shuttle  Main  E:  nit*:  (SSME)  duster,  the  External  Tank  (ET).  tbe  Orbiter,  and  Prelaunch 
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activities.  In  this  analysis  SSME  start-up  failures  are  included  as  part  of  the  SSME  cluster  risk  element, 
rather  than  as  part  of  the  Prelaunch  risk  element  The  SRB  pair  and  SSME  cluster  contribute  on  the 
order  of  90%  of  the  total  risk. 

To  derive  failure  frequency  distributions  for  each  of  the  risk  contributors,  a prior  distribution  of  failure 
frequencies  is  found  based  on  the  performance  of  surrogate  components  or  systems.  Bayes'  theorem  is 
then  used  to  update  that  prior  information  with  the  operational  flight  performance  of  the  element  As 
applied  here,  a prior  distribution  refers  to  the  best  available  indicator  of  in-flight  reliability  performance 
of  the  risk  contributor,  short  of  the  actual  in-flight  experience.  The  term  surrogate  means  a system  or 
component  sufficiently  like  the  reference  system  or  component  in  form,  function,  application,  and 
environment  that  the  failure  frequency  of  the  surrogate  is  a reasonable  indicator  of  the  failure  frequency 
for  the  reference  system. 

The  Bayesian  update  process  used  in  these  studies  has  the  general  property  of  reducing  the  range  of 
uncertainty  associated  with  a failure  frequency  distribution,  relative  to  classical  statistical  methods.  This 
method  is  employed  based  on  the  belief  that  there  are  data  available  — other  than  direct  flight  experience 
of  the  Shuttle  — which  allows  us  to  determine  the  catastrophic  failure  frequency  of  the  Shnttle  with 
greater  certainty  than  flight  experience  alone  would  ailow.  For  example,  we  believe  that  SSME  test 
stand  experience  provides  useful  information  regarding  the  catastrophic  failure  potential  of  the  SSME. 
At  the  same  time,  we  do  not  think  that  test  stand  experience  is  a perfect  indicator  of  in-flight  SSME 
performance,  so  it  would  be  inappropriate  to  pool  test  performance  directly  with  operational  experience. 
Bayes'  theorem  allows  us  to  supplement  the  relatively  scant  flight  experience  of  the  Shuttle  by  building 
on  the  infrastructure  of  confidence  established  by  test  experience.  The  objective  is  to  find  a prior 
distribution  that  is  the  best  available  indicator  of  in-flight  performance,  and  combine  that  prior 
knowledge  with  actual  flight  experience  to  produce  a result  in  which  we  are  more  certain  than  would 
have  been  possible  using  flight  experience  alone. 

The  prior  probability  distributions  for  each  of  the  Shuttle  risk-contributor  elements  were  selected  to  be 
the  best  available  indicators  of  Shuttle  in-flight  performance  (other  than  actual  Shuttle  flight 
experience).  The  prior  distribution  for  the  SRB  was  obtained  by  aggregating  the  performance  of  U.S. 
solid  rocket  systems.  For  the  SSME,  the  prior  distribution  was  obtained  by  examining  SSME  test  stand 
performance.  The  prior  for  the  Orbiter  was  obi^ined  by  combining  Auxiliary  Power  Unit  (APU) 
information  from  the  Shuttle  Probabilistic  Risk  Assessment  Proof  of  ConceDt  Study  [6]  with  generic 
component  information  for  other,  non-propulsion,  Orbiter  systems.  For  the  Prelaunch  prior,  generic 
component  information  was  used  as  surrogate  data  for  the  failure  modes  which  would  contribute  to  a 
pad  fire  or  explosion  in  Launch  Support  Equipment  (LSE),  or  inadvertent  destruction  of  the  Shuttle  by 
Range  Safety  Equipment  (RSE). 

The  prior  failure  frequency  distributions  are  combined  with  the  actual  flight  experience  of  the  Shuttle 
using  Bayes'  theorem  to  produce  "Bayesian  Posterior"  distributions.  Because  it  combines  significant 
failure-related  information  about  the  system  (in  the  prior  distributions)  with  flight  experience,  the 
Bayesian  posterior  generally  provides  a more  useful  and  accurate  indicator  of  the  actual  failure 
frequency  performance  of  the  system  than  a distribution  derived  from  flight  experience  alone.  In  the 
summary  charts  throughout  this  report,  the  distributions  reported  are  Bayesian  posterior  distributions. 

A rigorous  treatment  of  Bayes'  Theorem  is  provided  in  Appendix  B and  Appendix  C. 

Note  that  Bayesian  updates  were  only  performed  for  the  SRB  and  SSME  in  the  1988  Galileo  study. 

The  objective  of  this  study  was  to  update  the  results  of  the  Galileo  study  to  reflect  current  operational 
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experience,  so  a Bayesiar 
performed  for  the  Orbiter, ' 
ground  rule  preserving  tb: 
Orbiter,  ET,  and  Prelaunc 
prior  distribution,  and  had  :: 

The  Bayesian  posterior  di  i: 
combined  using  a Monte  f 
pair  and  SSME  three  eng; 
with  the  Orbiter,  ET.  and  1 
distribution.  It  should  be 
Carlo  sampling  in  their  sr.t  r 
which  provides  somewhr  : r 
comparison  of  Monte  Cat  * 

This  study  makes  extensive 
frequency."  Risk  analysis 
failure  frequency  distribut 

SRB  Sensitivity  Case 

The  only  in-flight  catastrc : 
failure.  The  prevailing  be 
caused  that  accident  was 
direction,  the  base  case  tt  ■ 
failure  frequency.  To  deti 
sensitivity  case  was  adder 

At  the  time  of  the  Galilee 
practical  and  meaningful : 
experience  of  one  failure: : 
frequency  of  1/50,  and  ni. : < 
ninety-percent-confident  i 
one  thousand  may  be  stati  • 

The  SRB  prior  distribute 
related  to  the  SRB.  Give  ‘ 
such  a prior  is  still  "the  h.i 
which  the  SRB  prior  imp  - 
sensitivity  cases  were  ana 
STS  5 1-L  failure  as  a val: 
distribution  is  calculated  : 
discounts  the  STS  51-L  f ; 
distribution  is  calculated  ; 
justification  for  the  use  o': 

Results  of  the  Analys 

Table  2 below  shows  the  i 
3 shows  the  Galileo-e, ra  i 


edate  using  the  operational  experience  (no  failures  in  fifty  five  flights)  was 
T.  and  Pre  launch  risk  clem  ms . T his  constitutes  a minor  change  to  the 
aethod  of  the  original  Galiho  study.  Since  the  failure  frequencies  for  the 
elements  are  small,  the  Bayesian  update  resulted  in  litde  change  to  the 
srsntially  no  effect  on  the  system  level  failure  frequency. 

reunions  for  two  individual  TRBs  and  three  individual  SSMEs  were 
do  simulation  to  determine,  the  failure  frequency  distribution  for  the  SRB 
duster.  These  distribution  : were  then  combined  (again  using  simulation) 
-launch  distributions  to  pro  luce  the  STS  system  level  failure  frequency 
:-sd  for  completeness  that  c ’h  e the  Galileo  study  contractor  did  use  Monte- 
'll ations,  the  current  study  tu  sri  the  Latin-Hypercube  sampling  method, 
tore  accurate  results  in  the  -rm  of  distributions.  A discussion  and 
■ .-.nd  Latin  Hypercube  semp'-ing  methods  is  provided  in  Appendix  E. 

use  of  die  lognormal  distrih  id  on  to  express  the  uncertain  quantity  "failure 
ave  found  the  lognormal  distribution  well  suited  to  conveying  uncertainty  in 
ns  in  a wide  spectrum  of  ap  plications. 


hie  failure  experienced  by  the  Shuttle  was  the  STS  51-L  ( Challenger ) SRB 
a at  the  time  of  the  Galilee  study  and  today  is  that  the  failure  mode  which 
moved  in.  the  Redesigned  Sc  tin  Rocket  Booster  (RSRB).  At  NASA’s  _|i;j 
riVre  did  not  include  the  Ch  nVmger  SRB  failure  in  the  calculation  of  SRB^ 
•Th,e  how  inclusion  of  that  failure  would  effect  the  system  level  outcome,  a 
This  case  is  labeled  Sensin  ' i?y  1. 

■udy  there  was  insufficient  SRB  ti&t  or  operational  experience  to  derive  a 
l ire  frequency  distribution  based  on  SRB  experience  alone.  Based  on  the 
f fey  srb  launches,  classics  l statistical  methods  yield  a mean  SRB  failure 
percent  certainty  bounds  of  1/11  and  1/975.  The  knowledge  that  we  are 
: it  the  SRB  failure  frequency  is  between  (essentially)  one  in  ten  and  one  in 
tic  ally  meaningful,  but  is  of  title  practical  engineering  value. 

was  therefore  (necessarily)  derived  from  solid  rocket  sources  not  directly 
tt*  increased  experience  nov r available,  it  is  appropriate  to  question  whether 
; available  indicator  of  in-fli*ht performance.”  To  determine  the  extent  to 
the  Shuttle  system-level  Hlare  frequency  distributions,  two  additional 
•i ; s without  using  the  solid  'onket  prior.  The  Sensitivity  2 case  retains  the 
i Member  of  the  RSRB  reliai  Tity  data  set,  and  a RSRB  faUure  frequency 
: s.,d  on  one  failure  in  110  S'P  B / RSRB  flights.  The  Sensitivity  3 case 
. s are  as  having  been  fully  corrected,  and  the  RSRB  failure  frequency 
: assuming  one-third  of  a fai  uit:  in  109  (counted)  SRB  / RSRB  flights.  Note: 
i'  rie- third  of  a failure  is  disen  %sed  more  fully  in  Appendix  D. 


A ttributions  derived  directly  item  the  published  Galileo  study  results.  Tab^ 
•rc;r mediate  results  of  this  stu  ’y  Table  4 depicts  the  April  1993  updated 
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results.  The  Galileo-e ra  results  in  Table  3 were  produced  and  presented  because  minor  differences  in 
the  tools  and  statistical  methods  employed  in  the  earlier  study  relative  to  this  one  resulted  in  slightly 
different  results,  particularly  in  the  tails  of  the  distributions.  These  differences  and  the  reasons  for  them 
are  discussed  in  detail  in  Appendices  A and  H.  The  differences  between  the  original  Galileo  study 
results  and  the  Galileos ra  results  of  this  study,  shown  in  Tables  2 and  3 are  process-oriented.  Both 
studies  used  the  same  data  and  underlying  assumptions.  The  difference  between  the  Galileo-e ra  results 
in  Table  3 and  the  April  93  results  in  Table  4 are  due  entirely  to  die  experience  acquired  since  the 
Galileo  mission. 


Table  2:  Risk  of  Catastrophic  Failure  for  the  Space  Shuttle,  STS  34  (October  88) 

Oripnal  Galileo  Study  Results 


Galileo  Study  results  - 

Based  on  294,230  seconds  SSME  test,  31  flights  - 0 SRB  failures  assumed. 

5th  % 

20th% 

50tfc«> 

Mean 

80thft 

95th  % 

88  SRB  Pair  (Base) 

7.69E-04 

1.60E-03 

3.6QE-03 

5.49E-03 

8.06E-03 

1.72E-02 

(51-L  failure  not  included) 

1 

I 

1 

1 

1 

1 

(1  ont  of... ) 

1300 

624 

278 

182 

124 

58 

88  SSME  Cluster 

8.33E-04 

2.18E-03 

5B5E-03 

1.09E-02 

U6E-02 

3.85E-02 

1 

l 

1 

1 

1 

1 

(1  out  of ...  ) 

1200 

458 

171 

92 

64 

26 

88  ET 

1.25E-Q5 

3.45E-05 

IB0E-04 

2.0QE-04 

2.86E-04 

7.69E-04 

1 

1 

1 

1 

i 

1 

(1  out  of ...  ) 

80000 

29000 

10000 

5000 

3500 

1300 

88  Orbiter 

1.09E-04 

1.89E-04 

3.45E-04 

4.17E-04 

6.25E-04 

1.11E-03 

l 

1 

1 

1 

1 

1 

(1  out  of ... ) 

9200 

5300 

2900 

2400 

1600 

900 

88  Prelaunch 

2.94E-04 

3.85E-04 

5.26E-04 

7.14E-04 

7.69E-04 

1.43E-03 

1 

1 

1 

1 

1 

1 

(1  out  of  ... ) 

3400 

2600 

1900 

1400 

1300 

700 

88  STS  (Base) 

2-86E-03 

5.95E-03 

1.28E-02 

1J2E-02 

2.78E-02 

5J56E-02 

(51-L  failure  not  included) 

1 

1 

1 

1 

1 

1 

(1  out  of ... ) 

350 

168 

78 

55 

36 

18 

88  Reliability  (Base) 

0.997 

0.994 

0.987 

0.982 

0.973 

0.946 

Galileo  Study  Sensitivi 

ity  Case  1 • Based  on  294,230  seconds  SSME  test  31  flights 

- 1 SRB  failure 

88  SRB  Pair  (Sensitivityl) 

1.8QE-03 

3.98E-03 

9.17E-03 

1J54E-02 

2.08B-02 

4J5E-02 

(includes  51-L) 

l 

1 

l 

1 

1 

1 

(1  out  of  ~ ) 

555 

251 

109 

65 

48 

22 

88  STS  (Sensitivity  1)) 

(includes  51-L) 

445E-03 

9.80E-03 

2.00E-02 

2.78E-02 

4.17E-02 

7.69E-02 

(lout  of..) 

1 

1 

1 

1 

1 

1 

202 

102 

50 

36 

24 

13 

88  Reliability  (Sensitivity  1) 

0.995 

0.990 

0.980 

0.973 

0.959 

0.926 

Tables  2, 3,  and  4 show  the  failure  frequencies  associated  with  each  of  the  risk  contributors  at  the  fifth, 
twentieth,  fiftieth  (median),  eightieth,  and  ninety-fifth  percentiles,  as  well  as  the  means  of  the  failure 
frequency  distributions.  Also  shown  for  each  risk  contributor  are  the  mean  flights  between  failure 
(rnfbf)  associated  with  the  failure  frequency.  At  the  system  level  the  reliability  associated  with  the 
failure  frequency  is  also  listed. 
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As  they  are  used  here,  the 
the  actual  {failure  frequent 
are  60%  certain  that  the  ac  i: 
values.";  and  so  on.  The  r: 
tendencies  of  these  distrib  r 
failure  frequencies,  the  dii 
the  failure  frequencies  are  t 
distributions  has  been  user, 
single  value- 

Table  3:  Risk  of 

Phi 

» i '» 

Galileo  era  (Intermedia’::  > 

(i:r 

1 1 i 

88  SRB  Pair  (PRA  Base) 

(51-L  failure  not  included) 

(1  out  of  ; 

88  SSME  Cluster  (PRA) 

(1  out  of  ... 

88  ET 

(1  out  of 

88  0rbiter 

(1  out  of 

88  Prelauncb 

(1  out  of 

88  STS  (PRA  Base) 

(51-L  failure  not  included) 

(1  out  ©it 

88  Reliability  (PRA  Base) 

Sensitivity 

tt 

88  SRB  Pair  (PRA  Sensitivity  ] t 
(includes  51-L  failure) 

out  of 

88  STS  (PRA  Sensitive 
(includes  51-L  failure) 

(1  out  of 

88  Reliability  (PRA  Sensitive 

w r i 
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to  entile  rankings  should  be 
■ reliability}  is  better  than.  *■ 
[failure  frequency  / relial 
u and  the  median  ('50th%'} 
ons.  They  are  not  equal  bee 
billions  are  lognormal  or  ne 
itmally  distributed.  Historic 
v.  the  "point  estimate  of  chcl 


w 


interpreted  as  follows:  "We  are  95%  certain  tha. 
;e  value  shown  in  the  *95th%‘  column.";  or  "We 
ility } falls  between  the  r20th%'  and  ’80th%‘ 
are  both  widely  used  indicators  of  the  central 
auss  these  distributions  are  skewed  - for  the 
xriy  lognormal,  meaning  that  the  Logarithms  of 
ally.  Uric  median  of  the  Galileo  study 
~.t"  when  referring  to  these  results  using  a 


Catastrophic  Failure  for  the 
t:  1 Shuttle  PRA  - Galileo 

l .ui  jtMinM  ©■■■ 1 '*  — — ■ "■  1,,lp,*r'  •■*«— 

: fi'io.lts  - Based  on  294,230  second 
Orbiter,  ana  Prelaunch  distribute 
! 5th  % 20  th  % 


I 1 .5637-03 

3.28E-03 

6.S  IE-03 

t 1.. 

1 

| L 

)l  642 

305 

147 

j 2.49E-04 

9.45E-04 

2.S4E-03 

_ -L~ 

1 

.wiJiijWI- 

) 4020 

1060 

352 

1.25E-05 

3.45E-05 

1 L.30E44 

1 . 

; 



) 80000 

29000 

10000 

i I.09E-04 

1.39E-D4 

?.  'J5E-04 

! i . 

1 ..... 

1 

)|  9200 

5300 

2900 

'!  2.94E-04 

3.S5E-04 

5J26E-04 

1 — 

1 

; 3400 

2600 

_1900 

A59E-03 

7.70E-03 

L36.E-02 

_J 

1 

_J 

) 218 

130 

74 

0.995 

0.992 

0.987 

r-  iKTirn^TJiii^ir  i ■ 

■s  ^ -i 

Space  Shuttle,  STS  34  (October  88) 
^.Intermediate  Results 
a ?,5ME  -rest.  31  Rights  - 0 SRB  failures  assumed, 
•mi  arc  the  same  as  Table  1) 

Mean 
9.90E-03 
1 

101 


■a  t.  t im 


8dth% 

1.41E-02 

1 

71 


95th  % 
2.83E-02 

1 

35 


7.38E-03 

1 

136 

2.00E-04 

1 

5000 


1 . Based  on  294,230  seconds 


,.ur!  nc-idUi! 

5.88E-03 

9.92E-Q3 

I 

1 

170 

101 

9.70E-03 

151E-02 

1 _ 

1 ._ 

103 

66 

0.990 

0.985 

L71E-02 


58 

*rirrrrr!?.s»  »i 

2.43E-02 


S.35E-03 

1 

120 

2.86E-04 

1 

3500 

2.66E-02 I 

l 

38  1 

7.69E-04 

1 V 

1300 

6.25E-04 

1.1  IE-03  1 

1 

1 

1600 

900 

7.69E-04 

1.43E-03 

1 

1 

1300 

700 

2.49E-02 

4^6E-02 

1 

1 

40 

21 

0.975 

0.953 

1 failure 

2.96E-02 

4.98E-Q2 

1 

1 

34 

21 

4.Q1E-02 

6.68E-02 

1 

1 

25 

15 

0.961 

0.935 
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Table  4:  Risk  of  Catastrophic  Failure  for  the  Space  Shuttle,  post-  STS  56  (April  93) 
STS  PRA  Phase  1 Study  Results 


] PRA  Phase  1 Study  results  - Based  on  484,932  seconds  SSME  test,  55  flights  - 0 SRB  failures  assumed. 

20tb% 

50  th  % 

Mean 

80th  & 

95  th  % 

93  RSRB  Pair  (Base) 

1.28E-03 

238E-03 

5.35E-03 

7.80E-03 

1.11E-02 

2.23E-02 

(Sl-L  failure  not  included) 

l 

1 

1 

1 

t 

1 

(lout  of...) 

782 

388 

187 

128 

90 

45 

93  SSME  Cluster 

6.46E-04 

1.35E-03 

2.92E-03 

4.69E-03 

6.53E-03 

1.41E-02 

1 

1 

1 

1 

I 

1 

(1  out  of ...) 

1550 

741 

342 

213 

153 

71 

93  ET 

1.16E-05 

3.14E-05 

8.91E-05 

1.92E-04 

2.53E-04 

6.85E-04 

1 

1 

1 

1 

1 

1 

(1  out  of ...) 

86400 

31900 

11200 

5200 

3950 

1460 

93  Orbiter 

9.89E-05 

1.75E-04 

3.19E-04 

4.10E-04 

5.80E-04 

1.03E-03 

1 

1 

1 

1 

1 

1 

(1  out  of ...) 

10100 

5710 

3140 

2440 

1720 

974 

93  Preiauncb 

2.15E-Q4 

3J50E-04 

5.84E-04 

7.02E-O4 

9.73E-04 

1.58E-03 

1 

1 

1 

1 

1 

1 

(1  out  of  .J 

4650 

2850 

1710 

1430 

1030 

631 

93  STS  (Base) 

4.48E-03 

6.S3E-03 

1.11E-02 

1.3SE-02 

1.86E-02 

320E-02 

(51-L  failure  not  included) 

1 

1 

1 

1 

1 

1 

(1  out  of  .„) 

223 

146 

90 

73 

54 

31 

93  Reliability 

0.996 

0.993 

0.989 

0.986 

0.982 

0.969 

| RSRB  Sensitivity  1 - includes  the  S1L  failure  to  update  the  Galileo  study  surrogate  prior. 

93  RSRB  Pair  (Sensitivity  1) 

4.63E-03 

7.80E-03 

1.35E-02 

1.66E-02 

2.33E-02 

3.92E-02 

(includes  51-L  failure) 

1 

1 

1 

1 

1 

1 

(1  out  of ...) 

216 

128 

74 

60 

43 

25 

93  STS  (Sensitlvttyl) 

8.48E-03 

1.27E-02 

1.94E-02 

2J26E-02 

3.04E-02 

4.77E-02 

(includes  51-L  failure) 

i 

1 

1 

i 

1 

1 

(1  out  of  ~) 

118 

79 

52 

44 

33 

21 

93  Reliability  (Sensitivity  1) 

0.992 

0.987 

0.981 

0.978 

0.970 

0.953 

| RSRB  Sei)sidvity2 

* No  prior,  1 failure  in  1 10  SRB  launches 

93  RSRB  Pair  (ScnsidvKy2) 

3.14E-04 

1.18E-03 

4.70E-03 

1.82E-02 

1.88E-02 

7.03E-02 

1 

1 

1 

1 

I 

1 

(1  out  of ...) 

3180 

850 

213 

55 

53 

14 

93  STS  (Sensitivity^) 

331E-03 

5.67E-03 

1.I2E-02 

242E-02 

2.^7£-02 

7.72E-02 

1 

1 

1 

1 

i 

1 

(1  out  of  ~) 

302 

176 

89 

41 

37 

13 

93  Reliability  (Seiuitivitv2) 

0.997 

0.994 

0.989 

0.976 

0.974 

0.926 

| RSRB  Sensitivtty3  - 

No  prior,  0 failures  in  109  SRB  Launches 

93  RSRB  Pair  (Sensidvity3) 

1.06E-04 

3.95E-04 

1.58E-03 

6.HE-03 

6.31E-03 

2.36E-02 

1 

1 

1 

1 

1 

1 

(1  out  of  >..) 

9480 

2530 

633 

164 

42 

93  STS  (Sensitivity3) 

2.54E-03 

4.11E-03 

7.44E-03 

1.21E-02 

1.48E-02 

3-38E-02 

(1  out  of ._) 

1 

1 

1 

1 

1 

1 

394 

243 

134 

83 

68 

30 

93  Reliability  (Sensitlvlty3) 

0.997 

0.996 

0.993 

0.988 

0.985 

0.967 
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This  analysis  concludes  tbH  with  ninety  percent  certairty,  the  current  risk  of  a catastrophic  failure 
leading  to  loss  of  a Shutt.!:  ; it  ring  the  Prelaunch,  Launch,  and  .Accent  phases  of  a mission  is -between 
one-in-thirty-one  (1/31)  a:  * nnc-in-two-hundred-twenrvtbree  (1/223),  with  a mean  risk  of  one-in- 
seventy-three  (1/73),  and  ■-dian  of  one-in-ninety  (1/9C ).  This  is  an  improvement  in  estimated  mean 
flights  between  failure  of  *■«%  at  the  mean  and  72%  at  r’v.  worst-case  end  of  the  certainty  interval  (95th 
percentile)  over  the  comp— ;d  risk  for  STS-34,  the  GaV.  'ey  mission.  (In  terms  of  estimated  failure 
frequency,  this  is  an  improve  meat  of  23%  at  the  mean,  *,1 4 1.%  at  the  95th  percentile).  The  principal 
source  of  this  improvemet  t is  increased  confidence  in  'he  S3' ME.  The  SSME  data  gathered  since  the 
Galileo  study  includes:  I viJare  in  190,701  seconds  of  test  operation  (the  equivalent  of  122  Shuttle 
flights);  473  test  starts  (eq  : valent  to  157  launch  starts)  and  24  Shuttle  flights.  This  means  that  the 
SSME  has  been  accumuia  : •«;>  statistically  relevant  experience  at  the  rate  of  4 or  5 equivalent  flights  per 
mission  This  "experiene : multiplier"  has  the  effect  of  ■ no  proving  confidence  in  the  reliability  of  the 
SSME  much  more  quick!'  ban  if  the  SSME  were  expo  ed  to  failure  only  during  actual  launches.  In 
contrast,  the  RSRB  is  on!;  exposed  to  failure  during  a Shuttle  launch,  so  the  confidence  in  us  reliability 
performance  builds  relati  , y slowly. 
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The  Orbiter,  ET,  and  Prelaunch  risk  contributions  remain  insignificant  compared  to  the  SSME  cluster 
and  RSRB  pair.  Currently,  the  RSRBs  account  for  between  29%  and  70%  of  the  overall  risk  to  the 
Shuttle  (57%  at  the  mean).  The  SSMEs  contribute  between  14%  and  44%  (34%  at  the  mean), 
Prelaunch  contributes  about  5%,  the  Orbiter  3%  and  the  External  Tank  1%.  The  non-propulsion 
systems  contribute  less  than  10%  of  the  total  launch  and  ascent  phase  risk  to  the  Shuttle  system.  The 
contribution  of  each  risk  element  to  the  total  Shuttle  risk  is  depicted  graphically  in  Figure  2. 


Figure  2.  Risk  Element  Fractional  Contributions  to  STS  Total  Risk 
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Discussion  of  the  Anaiysi  s- 

Solid  Rocket  Boosters 

As  discussed  in  the  precerl  % section,  a base  case  and  three  se  nsitivity  cases  were  used  to  calculate  the 
failure  frequency  for  the  I!  I RBs.  The  baseline  case  used  s.  prior  distribution  comprised  of  the  aggregate 
of  U S solid  rocket  expert  ’ v.  c in  launch  vehicles,  and  vpdtued  that  prior  with  actual  Shuttle  flight 
experience,  discounting  tlr.  STS  51-L  SRB  failure.  Ttu  first  sensitivity  case  (Sensitivity  1)  used  the 
same  prior  distribution,  b ; included  the  STS  51-L  failure  in  the  update  data.  The  second  sensitivity 
case  (Sensitivity2)  used  n:  prior  distribution  and  calculated  the  failure  frequency  distribution  direedy 
from  the  operational  expo  once  of  one  failure  in  one-h  >rfred-ten  (R)SRB-launches.  The  third 
sensitivity  case  (Sensitive:  T used  no  prior  and  calculated  the  RSRB  failure  frequency  distribution 
based  on ‘no  (0)  failures  in  ;:ris -hundred-nine  <R)SRB-la  inches.  The  nomenclature  (R)SRB  refers  to  the 
combination  of  SRB  and !!  PH  experience. 

As  of  the  time  of  the  Gali , ■ . > study,  there  had  been  twenty-five  STS  launches,  or  fifty  SRB  exposures. 

The  Galileo  risk  estimate  is  based  on  the  inclusion  of  -be  experience  of  six  successful  Shuttle  flights 
prior  to  the  Galileo  missic  ~ This  was  appropriate  for  risk  estimation  based  on  the  existing  launch 
schedule.  If  there  had  bee  i a f ailure  in  one  of  those  six  missions,  the  assessment  would  have  been 
revised  to  reflect  the  true  : perience  up  to  Galileo  launch  time.  The  total  SRB  exposures  used  in  the 
Galileo  study  and  Galilee  *?  intermediate  analysis  of  “Jus  study  is  therefore  62  SRB-launches.  The 
current  study  incorporate.';  ' [irrational  data  through  ST-v-56,  filty-five  Shuttle  launches.  The  total 
exposure  is  therefore  109  -launches  (discounting  the  STS  51-L  failure),  or  110  SRB-launches  if  t» 
failure  is  counted.  It  shoe".:  be  noted  that  the  Sensitivir/3  case  discounts  the  STS  51-L  failure  both  asW 
failure  and  as  an  exposure  The  Galileo  study  retained  'he  51-L  failure  as  an  exposure  in  calculating  the 
failure  frequency  distribir:  : ::.  .and  this  apparent  oversight  was  retained  in  the  Galileos ra  results 
calculated  for  this  study  ( " hie  3).  For  the  Galileo  study  die  magnitude  of  the  difference  (at  the  system 
level)  was  approximately  VT  not  significant  when  compared  to  the  overall  size  of  the  certainty  interval. 
In  the  current  study  the  iw;  ; oitude  ot  die  difference  is  trader  2%. 

The  failure  frequencies  fc:  :he  Sensitivity2  and  Sensitkrity3  cases,  and  for  the  surrogate  solid  rockets 
used  in  the  prior,  were  cal  dated  as  discussed  in  the  subsequent  Treatment  of  Demand  Related  Failures 

section  of  this  report 

(Note*  The  mathematical  repressions  presented  in  the  b;xly  of  this  report  are  intended  to  allow  the 
interested  reader  to  follow  and  verify  the  major  calculations.  Unwieldy  or  extensive  calculations 
required  to  completely  rep  P rate  this  study  are  presented  in  the  appendices.  The  mathematical 
expressions  used  here  anc  throughout  the  body  of  this  r*  port  use  verbose  variable  names  and 
mathematical  operators  a;:  ivy  would  appear  in  computer  program  or  spreadsheet.  This  convention  was 
selected  largely  as  a result  ■:  f the  difficulty  experienced  reproducing  the  Galileo  study  results,  which 
was  in  part  due  to  non -sir  ’ard  (or  different  standard)  mathematical  nomenclature.  The  convention 
used  here  sacrifices  a littl : :c.  readability,  but  ensures  that  the  results  can  be  easily  replicated. 
Mathematical  functions  a*  c!  conventions  are  from  Microsoft  EXCEL114  v4.0  and  are  shown  in 
Boldface.) 

KJ 
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Treatment  of  Demand  Related  Failures  (SRBs  and  SSME  Start  Failures): 

1.  The  means  of  the  failure  frequency  distributions  were  set  equal  to  the  maximum  likelihood 
estimator  (MLE). 

MEAN  = MLE  = Failures  / Exposures 

1.  a.  In  the  RSRB  Sensitivity 3 case  there  were  no  failures.  Given  the  experience  base  for  this  study, 
it  was  felt  that  a mean  based  on  the  assumption  of  one-third  (1/3)  of  a failure  was  justified. 
Specifically,  since  the  exposure  accumulated  by  the  SRB  to  date  (110  SRB-launches)  is  well  within 
the  range  of  mean  trials  to  failure  (mttf)  predicted  by  the  surrogate  experience,  an  assumed  mean  is 
both  justifiable  more  informative  than  basing  a distribution 

Appendix  E contains  a lengthy  justification  for  this  assumption. 

2.  The  fifth-percentile  (LOWER)  and  ninety-fifth-percentile  (UPPER)  of  the  distribution  for 
demand  related  failure  frequencies  were  calculated  in  terms  of  the  F distribution. 

LOWER  = (Failures+(FINV  (0.95,2*Failures,  2*Exposures-2*Failures+2))  / 

Failures*!*  lNV(0.95,2*Failures,  2*Exposuies-2*Failures+2))+(Exposures-Failures+l)) 

UPPER  =((Failures+l)*(FINV(0.05,2*Failures+2, 2*Exposures-2*Failures)  / 

(Faflures+i)*(FINV(0.05,2*Failures+2, 2*Exposures-2*Fails))+(Exposures-Failures))) 

(FEW  is  the  inverse  F distribution  function  with  arguments  "percentile",  "numerator  degrees  of 
freedom",  and  "denominator  degrees  of  freedom".) 

2. a.For  the  zero  failure  case  (Sensitivity3),  the  F distribution  LOWER  is  0.00,  however,  when  the 
distribution  is  converted  to  a lognormal  (Step  6 below),  the  relationship  between  the  MEAN  and  the 
UPPER  are  used  to  set  a lower  boundary  on  the  distribution. 

3.  The  MEDIAN  of  the  distribution  was  found.  (This  calculation  is  tedious,  See  Appendix  C.) 

4.  The  lognormal  error  factor  (EF)  is  found. 

EF  = UPPER /MEDIAN 

5.  The  distribution  was  converted  to  a lognormal,  preserving  the  MEAN  and  EF. 

See  Appendix  C. 

6.  For  the  surrogate  distributions  contributing  to  the  prior,  the  resulting  distributions  were  then 
aggregated  using  CARP™  (Computerized  Aggregation  of  Reliability  Parameters)  or  CARP2™. 

See  Appendix  C for  a discussion  of  the  aggregation  process. 

7.  The  aggregate  prior  distribution  is  converted  to  a lognormal,  preserving  its  mean  and  median. 

8.  The  converted  (lognormal)  prior  updated  with  the  flight  exposure  data  using  Bayes'  theorem  as 
follows  (see  Appendix  C for  detailed  derivation  of  the  mathematics  involved): 
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8. a, The  (failure-on-de  ind)  prior  is  converted  to  a Seta  distribution  preserving  the  mean  04)  and 
the  variance  (V)-  The  ' r ia  parameters  nQand  xq  are  obtained. 

nO«(M*(l-M):  V-l 

xq  ss  Ma^  * (1  - M 1 v • M 

8. b.  The  mean  (M‘)  ar  ‘ variance  (V1)  of  the  Bayesin  i posterior  are  calculated  based  on  f observed 
failures  in  N new  derr  o s : 

M’  = (x0  + f)/(nc  N) 

V‘  = ((xq  + 0 * (ii[  • N - x<)  - f))  / ((no  + N)*2  (no  + N + 1)) 

9.  The  Bayesian  posr  for  distribution  is  lognormrl  since  the  Beta  and  lognormal  distributions  are 
complimentary.  The  l Hijsorroal  error  factor  of  die  Tdiyesian  posterior  is  determined. 

EF  = EXP(  1.6448  • * SQRT(LN((V’/M,A2)  ♦ T)> 

Note:  The  quantity  If ' Vi\5  is  the  z value  (number  of  it.anda.nl  deviations)  at  the  95th  percentile.  _ _ 

At  several  places  in  this  process,  a conversion  is  made  t . oin  “raw  distributions  of  failure  frequencies  to 
lognormal  distributions.  These  conversions  are  done  to  facilitate  calculations,  and  conversion  to  a 
lognormal  distribution  ws.  tred  both  for  ease  of  calcult:  :i«  tn  and  because  the  lognormal  has  long  been 
accepted  as  well  suited  to  characterizing  failure  rate  (failure  frequency)  distributions.  In  an  cases,  this 
study  uses  the  default  coir  psions  of  the  Computerized  Aggregation  of  Reliability  Parameters  Vdl 
(CARP™)  aggregation  art  Aayesian  update  algorithms..  which  have  been  developed  and  used 
extensively  in  Probabilisl  1?.  isk  Assessments  for  a.  wirin  v ariely  of  applications. 

The  process  used  here  ws  < designed  to  preserve  the  met'  b and  rise  relationship  between  the  mean  and  the 
worst-case  (ninety-fifth  p:  r; entile)  value  in  the  original  distribution  as  much  as  possible.  It  is  believed 
that  the  Galileo  study  rest:  1 vj  differ  from  the  Galileos r interim  results  of  this  study  largely  because  a 
slightly  different  process..  designed  to  preserve  both  of  ne  extreme  values  (fifth  and  ninety-fifth 
percentiles)  was  used  in  tl:  study.  As  noted  in  Appendix  IT,  the  disadvantage  of  that  process  is  that  the 

central  tendencies  of  the  crjinal  distributions  are  lost  Ty  preserving  the  means,  "natural"  and 
"expected"  relationships  b us:d  on  point  value  caiculatic  is  using  the  mean  values  of  the  distributions  are 
preserved.  For  example,  r.he  mean  of  an  aggregate  distribution  is  the  average  of  the  means.  This  result 
does  not  hold  true  if  the  ni  r.:m  is  not  preserved  in  convening  distributions. 

The  exception  to  the  rule  : ( preserving  the  mean  and  the:  relationship  between  the  mean  and  worst-case 
values  is  in  aggregation,  ""'be  raw  distribution  resulting  from  aggregation  is  generally  irregular  and  may 
be  multi-modal.  (In  conn  t.  'be  other  "raw”  distributions  which  are  converted  tend  look  like  the 
lognormal,  right  skewed,  I cund  by  zero,  and  long-tailed.)  If  the  mean  and  95th  percentile  are  used  to 
convert  an  aggregate  dish  h r ion,  the  information  in  rtr  ether  tail  of  that  distribution  may  be  unfairly 
discounted.  Since  the  pur;  vse  of  aggregation  is  to  return  a readily-used  distribution  which  accurately 
reflects  the  experience  of  f"  ‘ set  of  aggregated  surrogates  as  f.  class,  and  since  the  raw  aggregate  may 
not  look  as  much  like  a lc  cnnal  as  the  other  distributions  being  converted,  it  is  important  that  both 
rails  be  equally  represents  i It  the  converted  aggregate  distribution.  For  this  reason  the  raw  aggregate 
distributions  are  convent:  preserving  the  mean  and  median  of  the  distribution,  the  median  being  the  ^ 
midpoint  between  thetwr  -mimes. 


Page  14 


Shuttle  PRAPhue  1 - Galileo  Study  Update 

Revision  1 


The  Solid  Rocket  Booster  Surrogate  Prior: 

The  surrogate  data  used  to  generate  the  prior  distribution  for  the  Galileo  study  is  shown  in  Table  5 
below.  A homogeneity  analysis  was  performed  for  the  Galileo  study  to  determine  whether  a better  prior 
could  be  arrived  at  parametrically,  based  on  the  reliability  of  the  surrogate  systems  with  respect  to 
diameter,  length,  and  thrust  No  statistically  meaningful  relationship  between  these  parameters  was 
found,  so  a simple  aggregate  of  the  surrogate  failure  frequency  distributions  was  used  to  create  the  prior. 
A more  detailed  parametric  analysis  of  solid  rocket  motors  was  subsequently  performed  at  Brookhaven 
National  Laboratory  ("NASA  Reliability  Database  and  SRB  Failure  Probability  Assessment")  [7],  This 
analysis  did  find  weak  but  statistically  significant  correlation  between  length,  diameter,  average  thrust 
and  failure  frequency,  but  did  not  provide  a statistically  useful  relationship  which  might  have  improved 
on  the  aggregate  prior  distribution.  Reference  7 also  showed  that,  for  the  data  they  had,  there  was  no 
significant  correlation  between  bum  time  and  failure  probability,  indicating  that  a demand-related, 
rather  than  time-related  approach  was  appropriate  for  SRBs. 


Table  5:  Aggregation  of  RSRB  Surrogates 
U.S.  Solid  Rocket  Motor  Experience  prior  to  August,  1988 
| 5th%  | 20tli%  | 50th  % | Mean  | 

Castor  2 failures  in  1640  Rights  {No  softgoods  in  design) 

Castor  | 1.42B-04  | 3.18E-04  | 7.38E-04  | 1.22E-03  | 

Star  9 failures  in  1887  flights  (No  softgoods  in  design) 

Star  | 2.36E-03  | 3.21E-03  | 4.43E-Q3  | 4.77E-03  | 

Minute  man:  12  failures  in  806  flights  (Softgoods  in  design) 

Minuteman  | 8.32E-03  | 1.08E-02  | 1.4 tE-02  | 1.49E-02  | 

Poseidon  / Trident:  4 failures  in  380  flights  (No  softgoods  in  design) 

Poseidon /Trident  | 3.10E-03  | 5.11E-03  | 8.64E-03  | 1.05E-02  | 

Titan:  1 failure  in  52  flights  (Softgoods  in  design) 

Titan  | 2.65E-04  | 8.01E-04  | 2.55B-03  | 6J8E-03  | 


80th  % 
1.72E-03 
6.12E-Q3 
1.86E-02 
1.46E-02 
8.13E-03 


95th% 

3.84E-03 

8.33E-03 

2.40E-02 

2.41E-02 

2.46E-02 


Aggregate  SRB  Prior  | 3.03E-04  j 


5.08E-03  | 7J9E-03  | 


2.11E-02 


The  data  in  Table  5 are  presented  graphically  in  Figure  3 below,  along  with  the  1993  RSRB  failure 
frequency  distributions  for  the  baseline  and  various  sensitivity  cases.  Note  that  as  the  mean  value  and 
uncertainty  of  the  distribution  associated  with  the  RSRB  (in  particular.  Sensitivity 3)  are  reduced  due  to 
failure  free  flights,  tire  probability  that  the  Minuteman  missile  is  in  the  same  statistical  population  as  the 
RSRB  decreases.  This  fact  is  significant  because  the  Minuteman  distribution  drives  the  aggregate 
distribution  mean  down,  and  increases  the  uncertainty  associated  with  the  aggregate  prior  distribution. 
Currently  we  include  the  Minuteman  experience  with  the  other  surrogate  solids  because  there  is  no 
compelling  engineering  or  statistical  reason  to  believe  that  the  Minuteman  is  a less  appropriate  surrogate 
than  any  of  the  other  solid  surrogates.  If  there  were  a strong  statistical  justification  for  removing  the 
Minuteman  experience  from  the  set  of  RSRB  surrogates,  both  the  mean  and  uncertainty  associated  with 
the  RSRB  prior  would  be  significantly  reduced.  Determining  how  many  failure  free  flights  of  the 
RSRBs  are  required  to  ensure  that  Minuteman  missiles  are  not  a valid  indicator  of  RSRB  performance, 
then  assessing  the  effect  of  that  knowledge  on  our  understanding  of  STS  catastrophic  failure  and  the 
relative  contribution  of  RSRB  risk  versus  SSME  risk,  is  recommended  for  further  analysis. 
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Figure  3:  Failure  Frequency  Distributions  'or  the  RSRB  and  Surrogates. 

(For  a single  ST!  B) 


3.50E-02 
3.00E-02  -■ 
2.50E-02  -■ 
2.00E-02  -- 
1 .SOE-02 
1 .00E-02 
S.00E-O3  + 
O.OOE+OO 


- S5th% 
o Moan 

- 5th% 


2 


09 

o 


2 

CO 


RSRB  and  Su  .negates 
Failure  Frequency  Distributions 


*8 


CL 


■I'  i •*?!**«»*« 


NotG  that  tha  Minutaman 
$UTO9*t«  May  not  be  indicative 
af  RSRB  Population  Given  a 
F,w  l^-ora  RSRB  Successes 


* J 

S?  rt. 


P> 

|q 

IS 


m 

EC 

v> 

CE 


s 


at 

si 

O*  v> 
- c 
8 9 


<2% 


cr 

<A)  _ 

a:  « 


co 


8| 


a ^ 
is 

a:  a 


3 * 


<o 


Space  Shuttle  Main  Engine  - SSMEs) 

The  SSME  is  a continuously-  solving  system.  To  date,  thou:  have  been  four  major  implementations  of 
the  SSME:  the  current,  Phas  ; II  engine,  used  on  STS-26  ar.d  subsequent  flights;  the  Phase  I engine,  used 
on  flights  6 through  25;  the  l : r Manned  Orbital  Flight  (FMOF)  engine,  used  on  flight  1 through  5;  and 
the  Pre-FMOF  engine,  whid:  mis  never  flown.  The  test  e:  :posux  of  all  engine  configurations  is  used, 
and  major  failures  that  have  r w coned  are  examined  on.  a cr  ■se-by-case  basis  to  determine  whether  that 
failure  would  have  occurred  • mi  resulted  in  catastrophic  damage.,  in  a current  flight  (operational) 
system.  The  Pre-FMOF  engi : a is  considered  sufficiently  t afferent  from  the  later  versions  of  the  SSME 
that  no  Pre-FMOF  failures  a: used  to  determine  the  SSME  Sailure  frequency  prior  distribution. 

SSME  catastrophic  failures  si  considered  in  two  groups,  start-up  failures  and  mainstage  failures.  Start- 
up failures  are  those  which  a;r:ear  to  be  demand-related,  nc  msinstage  failures  are  time-related.  The 
SSME  failure  history  follow’  ’he  well-known  "bathtub  curve 1 of  infant  mortality,  random  failure,  and 
wearout  but  the  existing  SS  ’ Fir  test  program  appears  to  bf  doing  an  adequate  job  of  preventing  infant 
mortality  or  wearout  failure*  ■ : operational  engines.  Specifically,  the  "Green  Run  program  appears  to 
weed  out  infant  mortality  fail  v.  res  before  operational  exposure,  and  the  "Fleet  Leader  program 
identifies  wearout  failure  me.  Ins  and  anempts  to  ensure  th:  t operational  components  are  not  exposed  in 
the  wearout  regime.  SSME  h:  lures  are:  therefore  treated  an  random  events,  and  the  associated  failure 
frequencies  (per  start  for  star  n failures,  per  second  of  nn  rime  for  mainstage)  are  treated  as  constants. 
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SSME  Exposure: 

The  test  exposure  of  the  engines  is  listed  below  in  Table  6. 


Table  6:  SSME  Test  Exposure 


Engine  Configuration 

Ground  Test 
Seconds 

Number  of 
Starts 

Pre  FMOF 

64,359 

FMOF 

38,764 

Phase  1 

98,191 

Phase  II  (Galileo  Study) 

92.934 

Galileo  Study  Total 

294,248 

789 

Phase  11  (since  Galileo  Study) 

190,702 

471* 

* Excludes  te 

sts  terminated  in  less  1 

than  4 seconds. 

iTotal  484,950 

1260 

The  exclusion  of  test  starts  where  the  test  was  terminated  in  less  than  4 seconds  was  to  ensure  that  only 
those  tests  which  exposed  the  engine  to  the  full  start-up  cycle  were  included  in  the  count  of  start 
exposures.  In  most  cases  the  short  terminations  were  the  result  of  a test-related  problem  or  error.  In  no 
case  was  a catastrophic  or  potentially  catastrophic  failure  excluded  by  this  filter. 


SSME  Failures: 

There  have  been  no  catastrophic  failures  of  an  in-flight  SSME,  although  one  major  incident  on  the 
eleventh  mission  (STS-41C)  could  have  resulted  in  a catastrophic  failure  if  a programmed  normal 
shutdown  of  the  SSME  had  not  occurred  in  time  to  prevent  it  All  catastrophic  SSME  failures  to  date 
have  occurred  during  testing.  To  determine  the  catastrophic  failure  frequency  of  the  SSME,  a prior 
failure  frequency  distribution  was  generated  based  on  the  SSME  test  performance,  then  updated  with  the 
operational  experience  of  the  Shuttle.  Like  the  RSRBs,  only  catastrophic  (uncontained)  failures,  or 
those  failures  which  could  have  led  to  catastrophic  failure  are  included  in  the  count  of  failures. 

At  the  time  of  the  Galileo  study  there  had  been  37  test  and  flight  failure  events,  of  which  3 were 
ultimately  considered  to  be  applicable  to  in-flight  failure  frequency  determination: 

During  test  750-160  on  12  February  1982,  a blockage  of  the  fuel  supply  as  a result  of  ice 
formation  occurred  during  start-up.  Both  high  pressure  turbines,  the  hot  gas  manifold  (HGM), 
the  main  injector,  the  main  combustion  chamber  (MCC),  and  the  nozzle  were  burned  as  a result 
This  failure  could  recur  in  flight  but  only  during  startup. 

During  the  eleventh  flight  (STS-41C)  on  3 February  1984,  the  augmented  spark  ignitor  (ASI) 
chamber  experienced  erosion  due  to  a drill  chip  lodged  in  an  ASI  orifice.  The  engine  was  cut 
off  by  pre-programmed  command  at  the  nominal  Main  Engine  Cutoff  before  the  failure  could 
propagate.  An  ASI  fuel  filter  was  subsequently  added  to  the  supply  line,  so  the  probability  of 
recurrence  of  the  incident,  and  of  its  becoming  catastrophic,  is  diminished  but  still  not  zero. 
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During  test  902-4;: : on  1 July  1987  a crack  in  Ur;  oxidizer  pre-burner  (OPB)  interpropeilant 
plate  resulted  in  th>  i"ormation  and  buildup  of  ic^  blocking  the  fuel  supply,  which  altered  the 
OPB  exhaust  flow  1 s ribution  and  burned  throepb  the  liner  causing  faceplate  erosion  and  high 
pressure  oxidizer  t- : bo-pump  (HPOTP)  turbine  p nd  damage.  The  failure  was  caused  by  cracks 
in  the  interpropet  i nlate-to-element  braze  join  s.  The  cracks  allowed  propellant  mixing  and 
caused  ice  contain  i ' it  ion  to  form  in  the  fuel  manifold.  ITie  failure  was  determined  to  be  the 
result  of  poor  bra.: t hints  made  during  manufac:  nx 

Since  the  Galileo  study  th  -2  have  been  three  additions’  major  incidents,  of  which  one  is  considered  to 
be  applicable  to  in-flight  f xre  frequency  determination . The  complete  text  of  these  incidents  is 
included  in  Appendix  F. 

— During  test  902-4"  i on  2 June  1989  an.  internal  p tessure  restraint  in  one  of  the  flex  joints  in  the 
LPb  i f discharge  :hc  ; failed,  releasing  a half  pr  m d ball  into  the  flow  which  ruptured  the  nickel 
plating  of  the  duct,  causing  a fire.  This  failure  rv  counted  as  an  applicable  failure  event. 

During  test  904-0  i on  23  June  1989  a.  HPOTP  rearing  failed  during  a 109%  rated  power  level 
(RPL)  extended  dx  t on  bum.  This  failure  is  nr  c downed  because  it  occurred  after  1270 
seconds  of  cornin',  r.v  operation  and  at  109%  R i L. 

During  test  901-69  ’ cm  7.4  July  1991  a second  st  tee  turbine  blade  in  the  HPFi  F failed 
(disassembled)  at  ■ 17  seconds  into  the  test  and  hss  than  100%  RPL.  This  failure  is  not  counted 
because  the  root  c s (internal  microshrinkage  ■ icoosity  allowed  hydrogen  embrittlement  insu* 
the  blade)  is  age  rented'  and  the  HPFTP  blades  •?  ere  ’’fleet  leaders"  and  had  accumulated  61 
starts  and  25,143  s ends  of  exposure,  well  beyr  nd  the  limits  allowed  for  flight  components. 

To  compute  the  prior  failri  frequency  distributions  based  on  this  test  data,  the  start-up  incident  was 

treated  as  a demand  relate.  :alure  and  the  remaining  failures  were  treated  as  time-related  random 
failures  (Mainstage  failure  . The  two  M^^age  failums  that  were  counted  in  the  Galileo  study  were 
treated  parametrically,  baii-i  on  the  conditional  probability  of  a catastrophic  (loss  of  the  Shuttle) 
accident  given  that  the  fail  'e  had  occurred.  This  treaty?  ent  is  described  in  the  Galileo  study  as  follows: 

"The  two  Mainstay:  (Phases  1 and  2)  OPB  failv;  identified  as  major  incidents  ...  were  treated 
as  follows.  The  ur •:  ertain  conditional  probability  of  the  recurrence  in  flight  of  each  incident 
resulting  in  a cata?  rip  hie  failure  was  assigned  parametric  values  of  0,  0.5,  and  1.0,  giving  rise  to 
an  effective  numb  - of  catastrophic  failures  of  0 1 , and  2,  respectively  — A Poisson 
distribution  was  dn"  trained  for  each  case.  " 

"The  three  cases  bn : tided  the  expert  judgment  tb  it,  the  two  oxygen  prebumer  failures  during 
tests  could  have  ben  catastrophic  if  they  had  oc - u rred  during  flight;  i.e.,  they  bounded  the 
modeling  uncertai;  v * for  the  Mainstage  catastrophic  failure  probability.” 

"The  three  distrib*  . i rns  were  then  aggregated  ino  an  average  distribution  assuming  that  each 
case  was  equally  Li'-osly  to  be  true."  [1] 
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The  test  failures  (including  the  potential  failure  on  STS-41C)  were  combined  with  the  test  exposure  to 
generate  prior  distributions  which  were  then  updated  using  Bayes'  Theorem  with  the  actual  flight 
experience.  The  start-up  failure  frequency  distribution  was  treated  as  an  on-demand  failure,  using  the 
process  described  earlier.  The  mainstage  failures  were  treated  as  time-related  using  the  process 
described  in  detail  below: 

Treatment  of  Time  Related  Failures  CSSME  Mainstage  Failures): 

1.  The  two  Mainstage  failures  from  the  Galileo  study  were  combined  with  the  new  Mainstage 
failure  using  the  aggregation  method  from  the  Galileo  study.  The  conditional  probability  of 
catastrophic  Shuttle  failure  given  the  new  Mainstage  failure  (the  LPF  duct  failure)  was 
conservatively  set  to  1.  This  resulted  in  aggregating  three  distributions  based  on  1 failure,  2 failures, 
and  3 failures,  vice  the  0, 1,  and  2 failures  of  the  earlier  study.  The  accumulated  test  time  used  was 
484,932  seconds. 

2.  The  three  distributions  (corresponding  to  1, 2,  and  3 failures  in  484,932  seconds)  were 
determined  assuming  that  failures  occurred  following  a Poisson  process  with  a constant  failure  rate 
X. 

2a.  The  mean,  fifth-percentile  (LOWER),  and  ninety-fifth-percentile  (UPPER)  of  the  distributions 
for  time-related  failure  frequencies  were  calculated  in  terms  of  the  Chi-square  distribution. 

I 

MEAN  = Failures  / Exposure 

LOWER  * CH 1 1 NV(0.95,  2*Failures)  / (2*Exposure) 

UPPER  = CHHNV(0.05, 2*Failures  + 2)  / (2*Exposure) 

CHIINV  is  the  inverse  Chi-Square  distribution,  with  the  parameters  "percentile",  "degrees  of 
freedom". 

The  derivation  of  these  equations  in  provided  in  Appendices  B and  C. 

3.  The  resulting  distributions  were  converted  to  lognormal,  preserving  the  mean  and  error  factor 

(see  Appendix  C).  i 


Other  risk  contributors: 

There  were  no  significant  new  data  sources  or  other  indications  that  the  failure  frequency  distributions 
calculated  for  the  Orbiter,  External  Tank,  and  Prelaunch  in  the  Galileo  study  required  recalculation.  For 
the  sake  of  uniformity  a Bayesian  update  of  these  distributions  was  performed  using  the  launch 
experience  to  date,  although  this  update  had  little  effect  on  the  STS  system  level  failure  frequency 
distributions. 


Combining  risk  contributors: 

The  failure  frequency  distributions  for  the  major  risk  contributors  were  combined  to  produce  the  overall 
STS  failure  frequency  distribution  using  a Monte-Carlo  type  simulation.  A commercially  available 
simulation  tool.  Crystal  Ball™  by  Decisioneering,  was  used  to  perform  the  simulations  within  the  same 
Excel™  spreadsheet  environment  that  was  used  for  the  other  calculations  in  this  study.  The  starting 
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failure  frequency  distribu'.i " ir  tor  each  of  the  risk  eleme  nts  (R.:>RB»  SSME,  Orhiter,  ET,  Prelaunch) 
were  converted  to  lognom  s •! 1 H>  facilitate  sampling  from  ’he  distribution.  The  actual  sampling  technique 
used  was  the  Latia-Hypen:  vie.  as  it  was  found  that  this  technique  modeled  the  tails  of  the  distributions 
more  accurately  than  Mom  Carlo  sampling.  Appendix  2 d escribes  the  differences  between  Monte- 
Carlo  and  Latin-Hypercubi  sampling.  The  simulation  rc  oriel  5s  described  below: 


Combining  Failur : 

1 . The  failure  frequei  t : 
a reliability  value  squr  t 
failure  frequency.  In  :h! 
the  input  failure  freque  v 

RSRB=EXP(-SR! 
(Probability  of  no  SR):! 

RSRB-Pair  = RSP.l 
(Probability  that  neitht : 
SRB  (Pair)  Failure 

(Frequency  of  SRB  fa.i  = 

2.  The  failure  frequeu - 
simulation  in  which  th. 
(Mainstage)  frequence:  i 
values.  The  reliability 


frequencies  from  the  R>s 
■ xu  the  RS5RB  pair  was  for;: 
if;  it.  The  resulting  RSRB  5 
.ii  mutation  these  calculation 
rv  distributions: 

Failure  Frequency) 
lilure) 

* rSRB 

• RB  will  fail) 

'requenev  * -LM(RrsrB-Pi? 

; it  in  flight) 

; ’ distribution  for  the  SSME 
updated  failure-on-demand  ( 
were  sampled.  The  sample' 
ifrjes  were  combined  (multi 


Elements  — The  Simulation  Model 
id  by  converting  the  RSRB  failure  frequency  to 
’air)  reliability  was  then  converted  back  to  a 
s were  repeated  for  each  sampled  value  from 


-ind  SSME  Cluster  were  found  by  running  a 
start)  and  (updated)  time-re lated-failure 
frequencies  were  converted  to  reliability 
died)  as  i ndicated;  and  the  corresponding 


failure  frequency  calculi; ftd. 

RStart  = EXP(-Sta: : Failure  Frequency) 

(Probability  of  no  cats;  nop  hie  SSME  failure  at  startup) 

RMainstage  = Exr  Mainstage  Failure  Frequenc  y * 520  seconds) 

(Probability  of  no  ca.tr.it ; i op  hie  SSME  failure  during  tsc&nt) 

rSSME  “ RStart * RMamsta« 

(Probability  of  no  cam;  op  hie  SSMb  failure) 

SSME  Failure  Fret,  nicy  = -LN(RgSME) 

(Frequency  of  catastro  ;1  hie  SSME  failures  (per  SSJ  I:  *•  tlight) 

rSSME Cluster 38  '’SSME*3  . 

SSME  Cluster  Fail- : re  Frequency  * -LN(RsSMF  Cluster) 

3.  The  updated  failur:  frequencies  for  each  of  the  risk  elements  were  then  converted  to  reliability 
values  and  combined  : retiplied)  to  produce  the  STS  catastrophic  reliability.  STS  Reliability  was 
converted  back  into  a failure  frequency  As  above,  these  calculations  were  performed  for  each  set  of 
samples  from  the  input  failure  frequency  distribution 

rSTS  = RSRB  (Pc  * rSSME  (Cluster)  * * RET  * RPrelaunch 

STS  Failure  Freqic  - -LN(Rsts) 

4.  The  resultant  (STS  Filure  frequency)  distributr  re;  were  not  converted  to  lognormal  since  no 

further  calculations  wi  i ' these  distributions  was  anticipated.  ^ 
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Mathematical  Tools  and  Methods: 

Ail  of  the  calculations  in  this  study  were  performed  within  the  framework  of  a prototype  Space  Systems 
Data  Workstation  (SSDW™)  being  developed  for  NASA  by  SAIC.  The  calculation  of  prior 
distributions,  the  aggregation  of  distributions,  and  Bayesian  updating  were  performed  using  CARP2™  a 
prototype  version  of  SAIC's  CARP1*4  (Computerized  Aggregation  of  Reliability  Parameters)  which  was 
developed  for  use  within  the  SSDW™  environment.  The  core  mathematical  engine  was  Microsoft 
Excel  version  4.0  with  a variety  of  enhancements  (add-in  functions)  developed  for  the  SSDW™. 
Decisiotteering's  Crystal  Ball™,  a commercial  (off-the-shelf)  simulation  tool  for  use  within  Excel™  was 
also  used  extensively.  ©Risk™,  an  alternative  simulation  product  from  Palisades  Software  was  used  to 
ensure  that  no  systematic  errors  were  introduced  by  the  use  of  Crystal  Ball™.  Appendix  A contains  an 
annotated  copy  of  the  spreadsheet  in  which  all  of  the  core  calculations  for  this  study  were  performed. 

The  simulations  used  throughout  this  study  used  20,000  trials.  This  number  of  trials  was  found  to  be 
sufficient  to  ensure  convergence  at  the  tails  of  the  resultant  (forecast)  distributions.  Specifically,  it  was 
found  that  20.000  trials  was  sufficient  to  keep  the  standard  deviation  of  the  5th  and  95th  percentiles  to 
less  than  5%  when  performing  multiple  runs  using  the  same  input  data  and  different  random  number 
generating  "seeds". 

Conclusion: 

The  principal  conclusions  of  this  study  are:  (1)  The  Space  Shuttle  today  is  demonstrably  as  reliable  as 
any  other  launch  vehicle,  and  under  the  reasonable  assumptions  of  this  study,  more  reliable  than  any 
other.  (2)  The  Space  Shuttle  Main  Engine  (SSME)  test  program  has  had  a significant  positive  impact  on 
the  reliability  and  crew  safety  of  the  Shuttle.  (3)  The  Redesigned  Solid  Rocket  Booster  (RSRB)  is  the 
most  significant  contributor  to  the  estimated  residual  risk  of  catastrophic  failure  to  the  Shuttle  among 
the  major  elements  considered  in  this  study  (RSRB,  SSME,  External  Tank  (ET),  Orbiter,  and 
Prelaunch),  since  the  only  opportunity  to  demonstrate  reliability  improvement  in  the  RSRB  is  through 
flight  experience. 

Comparison  of  STS  Catastrophic  Reliability  with  Other  Launch  Svstems: 

The  scope  of  this  analysis  was  to  determine  the  catastrophic  failure  probability  of  the  Shuttle  system 
during  prelaunch,  launch,  and  ascent.  While  this  is  not  the  same  as  mission  reliability  (the  probability 
of  successfully  completing  the  mission),  it  is  essentially  equal  to  the  probability  of  either  completing  the 
mission  or  returning  the  payload  intact  for  another  launch.  The  unique  ability  of  the  Shuttle  to  return  a 
payload  means  that,  for  most  purposes,  the  catastrophic  failure  probability  is  the  correct  value  to 
compare  with  the  mission  reliability  of  expendable  launch  vehicles  (ELVs).  This  study  concludes  that, 
at  ninety  percent  certainty,  the  current  catastrophic  reliability  of  the  Shuttle  is  between  0.969  (1/31)  and 

O. 996  (1/223),  with  a mean  of  0.986  (1/73),  and  median  of  0.989  (1/90).  The  same  quantities,  when 
calculated  based  on  a simple  binomial  for  1 failure  in  55  launches  are:  0.917  (1/12)  (Lower)  to  0.999 
(1/1111)  (Upper)  with  a binomial  mean  of  0.982  ( 1/55). 

The  Shuttle  results  are  compared  with  other  launch  vehicles  in  Figure  4,  based  on  data  in  the  letter  from 

P.  Rutledge  to  W.  Frazier  of  Code  QS  [9].  The  failure  data  was  used  to  derive  a binomial  which  was 
then  fit  to  a lognormal  ELVs  with  no  failures  were  updated  using  Bayes1  theorem  as  indicated. 
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Figui'i  ■ 1:  Reliability  Comparison  of  Active  Launch  Vehicles 
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In  Figure  4 the  top  axis  si; 
data.  The  left-hand  axis  • l 
reliability.  The  5th  percv  n 
two  entries  compare  the  !:1i 
determined  using  the  bin:  i; 
Pegasus  launch  vehicle  c 
Pegasus.  Since  the  Pegs 
52  prior  to  engine  ignitioi 
actual  uncertainty.  The  I: 
has  clearly  demonstrated  ' 

It  should  be  noted  that  trM 
that  was  beyond  the  scop  • 
AMSAA  (Army  Materia  I 
Reliability  Growth  [10]  y 
launch),  based  on  die  23.il 
for  Shuttle. 


the  number  of  failures  / rorab e:r  of  exposures,  and  the  "as-of’  date 

0 vs  the  failure  frequency,  and  she  right-hand  axis  shows  the  corresponding 
ils,  mean,  and  95th  percentile  tire  shown  for  each  launch  vehicle.  The  first 
Httie  reliability  as  analyzed  n this  study  and  the  Shuttle  reliability 

rial  ->  lognormal  conversion  based  on  1 failure  in  55  launches.  The 
T.bution  reflects  a Bayesian  update  using  the  Minuteman  missile,  the  core  of 
v includes  some  new  hardwire  and  must  be  successfully  deployed  from  a B- 
iJie  distribution  shown  probably  reflects  an  optimistic  assessment  of  the 
tsch  vehicles  are  listed  in  or- ter  of  increasing  mean  failure  rate.  The  Shuttle 
higher  reliability  than  any  ctirnr  active  orbital  launch  system. 

"Hlity  growth  was  not  mode  'ed  fer  the  mature  launch  vehicles  in  this  list,  as 
: V dds  study.  However,  a point  check  of  the  Delta  launch  vehicle  using  the 
■y  stems  Analysis  Agency)  growth  model  parameters  derived  in  Space  Ljumcll 
iris  a mean  reliability  of  0.9  H (instantaneous  failure  probability  of  0.023  per 

1 , niches  as  of  2/92.  This  is  xril  well  short  of  the  0.986  reliability  computed 


Impact  of  the  SSME  Tegj^  Program: 

The  principal  source  of  cl t m onsiraied  reliability  impro  remenl  in  this  study  Native  to  the  Galileo* ra 
results  is  increased  confircwx  in  the  SSME.  Because  of  the  test  program,  the  SSME  has  been 
accumulating  statistical!;,  Levant  experience  at  the  rare  of  4 or  5 equivdent  fhghts  per  m«sion. 
“experience  multiplier1'  lias  the  effect  of  improving  cor  fierce  in  the  reliability  of  the  SSMEmuch  ^ 
more  quickly  than  if  the  1 : ’ ME  were  exposed  to  fate  only  during  actual  launches.  In  contrast,  the 
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engines  in  an  ELV  are  only  exposed  during  an  operational  launch,  so  (like  the  Shuttle  RSRB)  the 
confidence  in  their  reliability  performance  builds  relatively  slowly. 

Although  the  test  program  has  served  to  improve  sratistical  certainty  in  the  SSMEs,  this  is  a relatively 
minor  secondary  benefit  The  primary  benefits  of  the  test  program,  and  its  principal  objectives,  are  to 
"weed  out"  infant  mortality  failures  in  new  components,  and  to  determine  what  components  are  subject 
to  wearout  or  life  limits,  and  set  operational  limits  well  short  of  the  wear-out  region.  It  is  because  this  is 
the  primary  purpose  that  the  SSME  test  data  is  not  pooled  directly  with  the  SSME  operational 
experience  for  this  study  — SSME  testing  is  deliberately  mote  strenuous  than  the  operational 
environment. 


Page  23 


Shuttle  PRA  Phase  1 - GaUieo  Study  Update 

Revir- 


References: 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 


Bloomqmrc  C.  et.  al.;  fiyfcpsident  ASSP  rsafiDt  fit  Shuttle  Accident  .Scenario  Probabilities 
for  the  Go.  n sv^Missioji;  PRC,  NASA  HQ  Code  QS;  April  1989 

Safie,  F.  and  Heard,  B.;  Sjae^Shirttl^M'ilLEEirine  Reliability  Analysis:  NASA 
MSFC/C  n , ; May  1993 

Biggs,  R.  . i at;  "SSME  Reliability  De&:  tn  in  a dan"  (Viewgraph  Presentation);  Rockweil 
Intematic:::',  Rocketdyne  Division;  Dec  mber  ' 990 

Brodowsl  . It,  StutsJce,  M.,et.  al.;Oalil  tdllJQ  Risk  Assessment  Data  Analysis  Final 
Reoort;  S C,  NASA  HQ  Code  QS;  May  :992 

McFaddett  R. , cl  al.;  Program  Plan:  Pro9?:bilisrlc.JBisk  Assessment  of  the  Space  Shuttle 
(Shuttle  PU.ilR^v  5;  SA1C;  June  1993 

Shuttle  Piv  yaailistic  Risk.Aisessme.MJ?'?'mf.fif.Concent  Studv 


Hsu,  F.  ar..  Azann,  M.  A.;  NASA  RsMUiQd&iabase  anti  SRg  Failure  Probability 
Assessmer  : Brookhaven  National  Laboratory:;  1992 

Galvagni.  -''rcgola,  Antona;  Risk  Asjss&sn tenLirLDesiSIi;  Unpublished  manuscript,  to  be 
presented tile  SRA.  Conference  in  Rome,  Italy,  October , 1993 


Rutledge,  A:  Leltter  to  W.  Frazier  of  N/1  S.A  Code  QS; 

Cotta,  R.  ar  i Kisko,  W.;  .Snag.e  L.aunsti rowHh  - Database.  Analyses 
Predictior,  Methodologies;  SPARTA,  Uk  .;  March  1992 


Page  24 


,“V 

Shuttle  PRA  Phaso  i - Galileo  Study  Update 


Appendix  A: 

Annotated  copy  of  the  spreadsheet  "RTGUPDT2.XLS" 

All  calculations  used  in  this  report  were  performed  in  this 

spreadsheet. 
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CARP  GaiUso  Basslins 
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Appendix  B 


BAYES  ESTIMATORS 
Introduction 

Classical  statistics  for  point-estimation  problems  assume  that  the  random  variable  reoresemine  the 

SSg  Sno^011 522  exP®riments  co™  ‘>om  some  density/!' . ; 9),  where  the  function /is 

^ddinona  n 15  assumed  **>«  the  parameter  0,  for  which  an  estimation  is 
desired,  is  a fixed  constant,  unknown  to  us. 

In  many  situations,  however,  there  is  additional  information  available  about  the  unknown 
parameter  v.  For  example,  one  may  have  the  evidence  (e.g..  through  considerable  experience)  that 
riL»  2“f?  88  a r?ndo™  vanabie  for  which  a realistic  density  function  can  be  postulated,  provided 

believed  to  be  relevant  for  the  present  situation  or  population.  The 
process'^  SeCtIOns  w addrcss  a m«ihod  to  incorporate  this  additional  information  in  the  Csnrrwftrtn 
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Baves  Posterior  Distribution- 

. If  the  parameter  $ is  the  v?.i:v;  of  a random  variable  0,  then  the  density  function  of  a random 
• variable  X is /fa  / 6) , that  is, conditional  density,  the  tic  ns ' cy  of  X given  0*6. 


Let  us  assume  that  the  densi  function  of  Q,  g(&)  is  km:  wn  and  completely  specified  with  no 
unknown  parameters  and  let.?;  c a random  sample  of  size  X /.  Xz,  ....  Xn-  The  objective  is  to 
find  an  estimate  for  6.  Fa:  t sample,  Q can  represent  the  failure  per  demand  of  a cenain 

component,  and  the  sample  • . Xz X*/,  the  outcome  of  eacn  demand  triaL  that  is  failure  or 

success.  The  classical  esdxnats  cr  8 is  a single  expression  that  includes  the  observed  sample 
Xs  and  the  form  of  the  ssur  jk:  density/.  Now  a pro  edurs  is  needed  that  contains  all  the 
infcuuiation  that  the  classical  e;  titrate  contains  olus  the  ne1  v information  of  the  known  density  of 

Prior  to  obtaining  the  sampii  ail  the  information  avails?!'  U:  about  6 is  that  it  comes  from  the 
distribution  g{9),  therefore  riled  prior  distribution.  After  taking  the  random  sample  (e.g„ 
udiizing  failure  records),  a new  distribution  is  needed,  which  summarizes  the  prior  distribution  and 

the  outcome  of  the  actual  sam : a and  it  is  called  posterior  di ’■tribution/t'0  f xi,  xz Xft),  that  is, 

the  posterior  distribution  of  <:•'  -riven  Xi=xt,  Xz*xz A^rur*. 

For  random  sampling,  this  nr  / distribution  is  given  by  Baves1  theorem,  as  follows  (Ref.  1, 
pgJ41]:  Ityg 


f(Blx  jjXj,,  • . , x i^) 


f(X; 


XyJ9)  g(6) 

. * * i , x . • . • X 


N 


trifa  leu"  see) 


N 

tmxie)?  g(6)  ds 

i*l 


U) 


After  this  posterior  distribution  obtained,  the  corresponding:  Bayes  estimator  can  be  computed  as 

the  mean  value  of  8,  chat  is: 


N 

e triple)  I : (.0) d9 
i-i 


0 = " 


N 

cn^ite)]  ;;<0!d0 

i=l 


(2) 


ft  is  stressed  char  the  Bayes  procedure  lies  in  the  complete  specification  of  the  prior  distribution.  If 
- the  past  experience  is  sutncientiy  extensive,  then  a reasonable  prior  probability  distribution  can  be 
assumed,  provided  that  past  experience  is  relevant  to  the  present  case.  If  past  experience  is  not 
available,  engineering  knowledge  about  the  design,  fabrication,  material,  environment,  of  the 
c^®P°nen:s  c*?  °e  usec* c?  seifccr  the  prior.  The  choice  of  this  prior  distribudon  often  involves  the 
additional  consideration  or  mathematical  convenience.  A flexible  distribution  family  which  is  easy 
to  handle  and  which  can  approximate  past  experience  by  choosing  the  appropriate  parameters,  is 
otten  selected  as  a prior  distribution.  The  fact  that  a particular  prior  is  selected,  generally  does  sot 
mvoiye  the  belief  that  the  parameter  is  actually  distributed  that  wav,  but  it  does  mean  that  such  prior 
fits  the  data  reasonably  weii  and  is  mathematically  convenient. 

A common  selection  for  prior  distributions  are  the  so  called  conjugate  priors  which  have  the 
property  that  the  posterior  and  prior  distributions  are  members  of  die  same  family  of  distributions, 
inert  tare,  the  posterior  function  has  a closed  form  analytical  representation.  Now,  one  question 
anses:  How  is  a conjugate  prior  identified?  The  answer  depends  on  die  problem  being  solved.  A 
mathematical  procedure  exists,  which  finds  pairs  of  distribution  (for  the  prior  and  random 
experiment.)  that  produce  a posterior  distribution  of  the  same  family  as  the  prior.  For  the  case 
being  studied  in  this  report,  the  distribution  of  the  random  variable  which  describes  the  experiment 
is  known  (Bernoulli  for  railure-on-demand  and  Poisson  for  time  failure  rates)  and  therefore  the 
appropriate  conjugate  pnor  distribution  has  to  be  found,  keeping  in  mind  that  such  priors  should 
represent  the  failure  rates  fairly  weii.  The  following  sections  discuss  two  cases  of  conjugate 
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Beta  Prior  Distriontion 


The  most  widely  used  prior  distribution  for  the  failure* t in  demand  probabilities  p is  the  Beta 
distribution.  There  are  two  r; min  reasons  for  this  choice.  r”rst,  trie  Beta  distribution  has  the  same 
range  as  p,  that  is,  the  intern ' '0. 1 ),  giving  flexibility  to  n?  reseat  any  failure  characteristic  within 
this  range.  Second,  its  matfiutnaticai  tntctability,  being  a a or  jugate  prior,  as  this  section  will 
demonstrate. 


Let  us  restate  the  problem  to  :::  've..  An  estimate  for  failure  :in -demand  probability  ‘p’  is  needed  Car 
a particular  group  of  compon  r-cs.  It  is  assumed,  frorn  pa;: : experience  or  expert  opinion,  that  the 
components  are  part  of  a lart;  ei  population  whose  failure-?  n -demand  probabilities  are  distributed 
according  to  the  Beta  dismlr  i tors,  completely  specified  by  any  two  parameters  (generic  data). 
Additionally,  a random  sar:  rk?  was  obtained  from  the  group  of  components  (data  rccouU 
analyzed)  so  that  the  total  number  of  failures  is  known,  r>  '*re!l  as  the  number  of  demand  triple 
(plant  specific  data). 

The  prior  distribution  g(p)  is  : c:  only  information  available  before  the  sample  is  obtained,  and  is 
given  by  the  Beta  distribution:  5 re  Appendix,  equation  (A.  I.)}: 


g(p) 


BCx^no-Xo) 


p) 


n ^ l 


(3) 


where  0<p<l,  no>xo> 0 and  B ' is  the  beta  function  ghrer  by  [See  Appendix,  equadon  (A^2)]: 


ml 

B(Z,W)  m J t*’l(l-t)'*  ':U 


The  mean  M and  variance  '/  c f 
(A.11)]: 


this  distribution  are  given  hy  [see  Appendix,  equations  (A.7)  anrf 


(4) 


Vs  Mn0'*o> 

n02(no+  D (5) 

From  equadon  (4),  it  is  seen  ;i  t xq  can  be  interpreted  us  v~  ilures  for  the  prior  distribution  and  no 
as  demands,  therefore  coiled  "y-sendo  failures"  and  “pseuric  demands,”  respectively. 

The  prior  distribution  is  usual!  y specified  by  a mean  and  variance,  rather  than  the  parameters  no 
andxo.  A conversion  is  necessrry  to  obtain  ng  and  xo  giver  mean  and  variance. 

From  equation  (4),  XqsM  0 (6) 

Inserting  equation  (6)  into  (5): 
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Mn0(no-Mno) 


(Hq+I) 


n0+i 


Now,  solving  for  no- 

n M (1-M)  . 
n0*  v - 1 


So  equations  (6)  and  (7)  allow  the  conversion. 


(7) 


Let  us  analyze  the  Aft&rin  random  experiment  performed  to  update  the  prior  distribution. 
Assuming  that  the  failure -on-demand  probability  p for  each  component  is  constant  a:  each  denwnH. 
a Xi  is  the  outcome  of  the  :th  demand  trial,  that  is  failure  or  success  (numerical  values  1 or  0, 
respectively),  then  Xt  follows  the  Bernoulli  distribution.  Therefore,  the  density  function  for  X;, 
given  a faiiure-on-demand  probability  p is: 


f(*tfp)  ” pT’(l-p) ' * 


Now,  the  joint  density  of  an  independent  random  sample  of  N of  such  trial  demands  is  given  by: 

H 

n 

iml 


f(xl,x2,.-,.xNip)  = n fcx^)  = ps*‘  a-p)N*IX,=  p^i-p)^ 


(8) 


where  / = Xn  is  the  total  number  of  failures  observed.  Note  that  /,  being  the  sura  of  Bernoulli 
variables,  is  distributed  according  to  the  Binomial. 

The  denominator  of  equation  ( 1 ) can  be  calculated: 


f(Xj,X2».  f(x  g(p)  dp  * j pr^~P^Nfg^x  ^ x J P*D*l(l-p)n#***”ldp1 

'ioTo-n o-V  J.P  M <"> 


This  integral  can  be  solved  using  the  definition  of  the  Beta  function,  with  the  following 
substitutions  [See  Appendix,  equation  (A.2)|; 

x0  + f*z 

n0  + N - x0  - f = w 


Rewriting  equation  (9): 

f(X,’X2 Xn>  *B(xo.  no-xoJoP*''  dP 
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Now,  the  integral  is  the  '.Ir.zi  function  Sfz.wj.  So  substituting  buck z = xo+f  and  w » no+N-xp - f: 


tt*  v v 5^-0  + f-  n«+N-x0- 

* \ A.  1 ) A t A.  M|  *(  - • mi 

S(xo,n0-xfl) 


(10) 


So  finally,  the  posterin'  distribution  is  obtained  bv  inserting  equations  <8),  (3)  and  (10)  into 
equation  (I): 


f(p  bt»,x2,...,x^> 


Pf0 

B(:c 


M-r  *..i 
P 


a 


n0  + N - x0  - f)  B(;  ,,  -r  t\  n0  «>  N - x0-  0 


(U) 


Comparing  equation  (111  with  equation  (3),  it  is  noted  -hat  the  posterior  distribution  is  from 
the  Beta  family,  with  the  i iintrneters  modified  as  follow;: 


x0  =»  x0  + f 
n<j  =>  n0  + M 


Thereraie,  the  posterior  m : ji  { Bayes  estimate  for  p)  and  posterior  variance  are  obtained  by  making  fc.  vm 
the  corresponding  repiac:  ?. tents  in  the  expressions  of  "h--  orior  mean  and  variance,  equations  (4) 
and  (5),  that  is:  ‘ 

M*«  Xo+' 

n0+N  (l2) 


y,m  (Xo+0  (n0+N  -■■:0-Q 

(n©  +N)  ^ (n o » !\  + 1 ) ^2) 

It  is  noted  from  equation 1 that  the  posterior  mean,  hat  is.  the  Bayesian  update  is  the  quodent 
between  the  pseudo- fail . r sm  plus  observed  failures  and  the  pseudo-demands  plus  the  trial 
demands. 


Gamma  Prior  Distribution 

: 3?*  generally  used  prior  distribution  for  the  time  related  failures  a.  is  the  Gamma  distribution.  This 
oistnbunon  is  adequate  to  represent  failure  rates  and  it  is  mathematically  convenient,  beinz  a 
conjugate  prior,  as  the  rollowme  derivations  wiil  prove.  8 

The  probiemto  solve  is  analogous  to  the  previous  one,  but  here  an  estimate  for  the  time  re!*?ed. 
rauure  is  needed,  and  therefore  a Gamma  distribution  is  chosen  as  prior  (generic  data).  Also,  a 
random  sample  was  obtained  from  the  group  of  components  of  interest  (analyzing  failure  records) 

f^ures <pUmt^^^c^ata/a^UrCS  IS  *cnown’ as  wel*  as.lhe  total  exposure  corresponding  to  those 
The  prior  distribution  g(X  ) is  here  given  by  the  Gamma  distribution  (See  Appendix,  equation 


gtt)=  p 


r(a) 


. a*  I -81 
- C 


(14) 


(B*2)] ^ 0 >0  and  F (ct)  is  the  gamma  function  and  is  given  by  [See  Appendix,  eqtw*'«t 


Hz) 


=JT 


^e  mean  and  variance  of  rhis  distribution  are  given  by  [See  Appendix,  equations  (B_S)  and 


a 


v.-SL 


(15) 


(16) 


Fimn  equation  (15)  it  is  seen  that  a a can  be  interpreted  as  failures  for  the  prior  distribution  and  B 
as  demands,  therefore  called  “pseudo  failures”  and  “pseudo  demands,”  respectively. 

The  prior  distribution  is  usually  specified  by  a mean  and  variance,  rather  than  the  parameters  a and 
p to  mean  and  variance: 


From  equation  (15):  a = M3 

Inserting  Equation  (17)  into  Equation  (16): 


(17) 


V=M 


So  now; 
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(18) 


P- 


M 

V 


(19) 

Pquannns  (18)  and  (19)  allr.  t:.  obtain  Of.  and  P given  mein  M artel  variance  K. 


Let  us  now  look  into  the  p i • : ; 
prior  distribution.  If  failed 
‘‘immediately”  after  failure, 
across  different  items  •: 
Poisson  process  is  genera: 
length.  Defining  a random 
follows  the  poisson  discrifci; 


a5f  °f  analyzing  the  failure  semis  (random  experiment)  to  upHaw*  the 
;rns,  m the  group  or  components  understudy,  are  replaced  nr  repaired 
-•  assuming  that  failures  or  ur  independently  and  at  a constant  rate  in 
en  lor  any  given  item  (and  s coirssponding  replacement,  if  it  fails)  a 
.vim  parameter  Ar,  A being  the  constant  failure  rate  and  rthetim* 
. noble  X.  representing  the  r umber  of  failures  for  the  i*  item,  then*- 
tot.  that  is:  ' 


f(X?|A)=:CXt)X,e-A' 

*i- 

Now,  the  joint  density  of  a x dependent  random  sample  : ?f  « items  is  given  by: 


f(x„  X2,...tx,ix)  * [’{ffXj 


rW 

i*l 


iTxr 


(20) 

where/  = Sa  is  the  total  nut;  :::«<•  of  failures  observed.  Note  that /is  also  Poisson  distribution. 
The  denominator  in  Equarior  ) can  now  be  calculated: 


f Mid 

f(x„  x2, ...,  Xjt)  * J f (Xlt  x2, ....  xnU)  g(A.)d;1  * 


*ku»-  . f „ a 
e (Xt)  3 . *-1  .jjj. 

o nx-»  r ''  e c^"*s 

1W*  1 (a) 


Cr  (T  r a , r 

flx|!  ?(a)'J<5 


e‘{3  + <n)*dX 


(21) 


The  integral  can  be  solved  us; 
(A.4)J,  so  equation  (21)  ha; 


\ r-pfi  definition  of  the  Gamma  function  (See  Appendix,  equation 
: be  rearranged  to  an  equivalent  form  as  a Gamma  function. 
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Multiplying  and  dividing  by  ({3  + tn)  a * f • i : 


^ (x |i  ^2?  • • • , X u)  “ 


- A.  3* 


[(P  + tn>Al 


;«  ♦ f • 1 


r<a)  | ((3  + ln,“-f-' 


-<p  + m>A.  d(3+  tnlX. 

“ ‘T+m 


hxi! 


f [(P  + tn)Xj“  *f‘ 1 e-1^  * tp)i  dcp+-  mjA. 

(|J  + in)tt  *V (a)  * - 


Now,  substituting  inside  the  integral: 

a + f»z 
(P  + tn)  A * r 

‘S  ‘hC  Gi,m”  f“nC,,0n  r<2)'  “'herc  1 * “ * f-  a«orti"S  » the 

Then  the  integral  can  be  solved. 

«*,.  iL-P*  n<r*0 

»♦«,)'*'  (22) 

lotion  (i/*^  posterior  distirburion  is  obtained  by  inserting  equations  (20),  (14)  and  (22)  into 


i ^ *|3X  .a 

W.lx„x,,...,*„„e  **•  ' (B+n» 

nxJ  r(o.o 


a*( 


_ (p+tn)  ^ X(|J+ta)  . cn-f-l 

r(«+o  5 A 


(23) 


equation  (23)  with  equation  (14),  it  is  noted  that  the  posterior  distribution  is  also  from 
tne  uamma  fanuiy  (a  conjugate  prior)  with  the  parameters  modified  as  follows: 

a=*a+f 
a=s>p+tn  = }3+-T 

where  T — nds  the  the  total  exposure  time. 

c posten or m c an  (Bayes  estimate  for  A)  and  posterior  variance  are  obtained  by  mating 

j^^P00-  S replacements  in  the  expressions  of  the  prior  mean  and  variance,  eo"*nr>ns  (15) 
and(Io),thatis: 


It  is  noted  from  equation  (2  '• that  the  posterior  mean  is  ‘ brained,  as  pseudo  failures  plus  observed 
failures  divided  by  pseudo  r ••pc sure  plus  observed  expos  r re. 


A Procedure  to  Perform  a Bavesian  (Jndatc 

: J^|s  sect*on  describes  a step  by  step  procedure  to  obtain  a Bayesian  Updated  estimate  of  time 
• failure  rates  and  faiIure*on-demand  probabilities,  given  a lognortnaiiy  distributed  generic  rate, 
specified  by  its  mean  vaiue  and  error  factor. 

* Step  l;  Find  variance  V for  the  lognormal. 

If  the  generic  data  is  iognomiaily  distributed,  and  is  given  bv  its  mean  vaiue  M and  error  factor  £F, 
tts  variance  V is  given  by  (See  Appendix,  equation  (C.12)J: 

l)  (26, 

where  andc2  are  the  mean  and  variance  for  the  associated  normally  distributed  variable. 
According  to  equation  (C.7)  in  the  Appendix: 

2 

M = exp  (p.  + -y-  } 

Inserting  log  to  both  sides  and  multiplying  by  2: 

2|i  + <r2*  2 log  M (27) 

The  variance  a2  is  related  to  the  error  factor  | See  Appendix,  equation  (C15)J: 

og(EF) 

1.645 


The  error  faaor  is  the  ratio  of  the  95th  pen: entile  to  the  median  of  the  lognormal  distribution. 
Now,  replacing  Equations  (27)  and  (28)  into  equation  (26): 


* Step  2:  Conversion  to  a Beta  or  Gamma  Distribution 
a)  Failurc-on-demand: 

Converth  the  lognormal  distribution  to  a Beta,  preserving  the  mean  and  variance,  and  the 
parameters  xo  and  no  as  given  by  equations  (7)  and  (6): 


n0* 


*o  = 


MO  - M) 

V 

M2(l  - M) 

V 


l 

-M 
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b)  Time  Relatec  Failure. ; 


' 9onvert  *e  *°§nonTtal  3 ' (l  distribution  to  a Gamma,  preserving  the  mean  and  variance,  and  obtain 
• the  parameters  a and  (3  rji  ven  by  equations  (18)  and  ' 19) 


• Step  3:  Perform!:  iiyesian update: 
a)  Failure-on-demand 

With  the  observed  failures  | in  N demands,  calculate  the  posterior  mean  M’  and  variance  V tier 
eqmnons  ( 12)  and  ( 1 3):  e 


m' 


r 

V 


n0+  - 

(*o  + ri  fnQ  r N - xq  - Q 
(n0+  T "(n0+  N -t-  l) 


b)  Time  Related  Failures 

With  the  observed  failure  - and  total  exposure  7*,  find  the  aosterior  mean  M’  and  variance  V per 
equations  (24)  and  (25):  ‘ 


M 


V = 


a + f 

3 + f 

a + ! 

(3  + T:: 


* Step  4:  Convert  the  w inrriburion  back  into  lognormal,  calculating  the  tiiui  factor 

The  transfomtahon  is  madi.  treserving  mean  and  variance.  The  posterior  error  factor  EF  can  be 
obtained  using  equation  (2\ replacing  all  variables  with  the  corresponding  posteriors,  and  solving 
for  the  tri  or  factor  EF: 


EF  * exp 


The  final  updated  estimate  i - low  given  by  its  mean  M'  aad  error  factor  EF, 


Aramdfo 

A.  Bern  Distributing 

Hie  Beta  distribution  is  given  by  the  following  function:  (Ref.  2.  pg.  658] 


S( P) 


1 

B (X(>n0-  x<j) 


px°‘l(l.p)no-xo- 1 


(A.1) 


where  0<p<l,  n<>>xo>0,  and  is  the  Beta  function,  defined  as: 


B(z,w)«/0t2*  l(l  -t)w‘  ldt 

The  Beta  function  is  related  to  the  Gam™  function  as  follows  (Ref.  3,  pg.  258,  Eq.  6.2_2]: 

B (z,w)  * ^ r fw) 
r(z  + w) 

where  F(.)  is  the  Gamma  function,  given  by  [Ref.  3,  pg.  255.  Eq.  6.1.1]: 

e*‘dt 

A recurrence  formula  can  be  obt*m*d  5y  computing  F(z+1)  and  intcgnmnjj  by  pans: 


(A3) 


(A3) 


(A.4) 


r(z+l)«/“t  e dt 

Now  defining  u=t*  and  dv  * e**  dt,  and  calculating  du  «*  z dt  v ~ -e-yThe  integration  by 
pans  can  proceed,  using  the  scheme:  y 


J u dv  *uv  • J v du 


r (z  + 1)  — *2(-c  ) I o — J0  (-e*)  zt2*1  dt» 

Note  that  (t*  e-*>  goes  to  zero  in  both  limits,  when  r goes  to  zero  and  t goes  to  infinity,  therefore 
the  first  tetm  in  the  right  hand  side  vanishes. 

r(t  + i)=zL  t1*1  e'ldt 


The  integral  obtained  is  the  Garnrra  function  T(z)  and  the  recurrence  fcuuuiia  is  obtained: 

r(z  + i)=zr(z) 


(A3) 


,»l 

A-h  Mean  tbr  the  Beta  Dis.liribntion 

. The  mean  is  the  expected  ','n.  us  of  the  random  variable  : ■ E[p.l«p  jand  is  a measure  of  central 
■ location  of  the  density  of  p,  is  computed  as: 


p = E[p]  = {0p  g(p^  ■.>«  f, 


1 

n^r 1 — PX°0  » pi"'9**0  ' 1 dp  ! 


1_ _..f‘ 

B(xo,n0  * 1 ,!  ' 0 


p'°d  -P>n°-*Q:Jdp 


Substituting  variables  insidt1  1 c:  integral: 


x0»  z - l =s>  z = x0  l 
rf0  - x 0 ■ w ^ 


(A.6) 


*L 


Now  equation  (A.6)  can  be  ••••vntten  as: 


1 

B(*o>  no " xo) 


i f * . w- t 

Cl  * p) 


dp 


The  integral  now  is  the  Bet:  unction  B(z,w)  as  defined  in  equation  (A.2)  where  z * x<j  + 1 and 
w * no  - xq  as  defined  by  the  •:  innjje  of  variables.  Thenttore: 


P 


B(x0+  1,  n0-X(,i 
B(*o,  n0-  x0)’" 


Using  the  relation  with  the  G . untt  function  given  by  equation  (A.3): 
r(x0+  1)  T(n0-  y0. i 

r(n°+  ! ) ^ P(n0)  TCxq-H) 

P Tfxo)  r(n0-xtl;  T(n0+ 1)  r(x„) 

r(no) 

Now  making  use  of  the  recur:  :n::.;e  formula  (A.5). 

F(np)  x0  r(x,ii  _ 

P”n0r(nfl)  F(x0;  ’ n0 


So  the  mean  value  for  the  Be:. 


distribution 


is: 
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(A.7) 


A-2:  Variance  fnrthe  Befa  Dtstriburiofl 

The  variance  V of  a random  variable  p is  the  measure  of  the  suread  or  dispersion  of  its  density  and 
is  &uji cased  as  the  following  expected  value  J 

v*e[(p-p)1 

Expanding  the  squared  term: 


V * E [p2  - 2pp  +p  ^ = 

P j'2E(p]p+p  a 

-2]  2 _ 2 
p j - 2 p + p = 


= E 
* c 

*E 


2 1 -2 
P J*  P 


So  Efp2}  needs  to  be  calculated: 


(A.8) 


«0+ 1 


B(x„,ao.x0)<>  (1-P> 


v«o**0* 1 


dp: 


*B(xo,  n0-xo)  ^0  P^^'a-P)”0'*0'^ 

Again,  cnanging  variables  inside  the  integral: 

x0+  l =*z- 1 z«  x0  + 2 
n0-x0*  w 

Equation  (a.9)  can  be  rewritten  as 


(A.9) 


eM-  Bo^—y/^-Vi-pr'dp 

Now  the  integral  is  the  Beta  function  B(z,w)  as  defined  in  equation  (A.2),  where  z = x<j  + 2 and 
w » hq  - xo  as  defined  by  the  change  of  variables.  Hence: 

tj  f 2l  S(x0  + 2,  Qq*xq) 

P * ~ B(xo,  n0  - x0) 
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Using  the  relation  with  the  G tnma  function  given  by  eqc  uion  i'A.3): 
r(x0f  2!  r:ri0 - x0) 


e(p1  = 


r (.• ,,  2) 


r(x0) 


' - 0 " x()>> 

' :;a’ 


r(x0  t 2 ) _ 
r (x0)  r i'r:  o +•  ?.) 


Now  using  the  recurrence  fcrrum  (A. 5): 

e [p 2]  - — 0 ^ n 1 ' ^ n«o) 


ro;c. 

(x0+  u x0r(>. 


(nQ+  1)  F(n0-i-  l) 

r(n0> 


r (xq) 

xo(x0+  l) 
n0(n0+  i ) 


(n0+  I)  n0  r(ntl) 


Insening  equation  (A.tO)  ini  ::  equation  <A.X),  the  variance  :s  obtained: 

v = e [p2]  - p 2 « ^ :: ialil  - iSiil = 

lPJ  P nc.  l".u*  l)  2 

Ho 

2 2 2 2 
_ n0x0(x0+  i)  ♦ x,::  Cn o + I)  _ n(>x0  -t-  n0x0-  n0  x0  - xq* 

2 *? 

no  (no  + ' n0  (n0-  l) 

x0(n0-  xp) 

n02(no+  0 

Hence,  the  variance  for  the  Ele  :t  distribution  is  given  by: 
y m xo(no’  xo) 

no2(no+  0 


(A.  10) 


(A.11) 
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B - Gamma  Distribution 

- The  Gamma  distribution  is  given  by  the  following  function  [Ref.  2.  pg.  658] 

r (B.l) 

where  x>0,  ct,{i>0  and  IT)  is  the  Gamma  function,  defined  as 


r(z)=J0  t*~  1 e“d t 

A recurrence  formula  for  the  Gamma  function  was  found  in  Secdon  A (equation  A.5): 
rcz+  i)-zr(z) 

B.l:  Mean  for  the  Gamma  Distribution 

The  mean  is  the  expected  value  of  die  random  variable  X (E£X]  = A.)  and  is  calculated  as: 

^=^]=J0’- 6ft) <«.-!„ 
rca)j0(w  e “X" 

Changing  variables  inside  the  integral: 
jSX=t 

a = z-  1 =>  z=  a+  1 
equation  (B.4)  can  be  rewritten  as: 


(B.2) 


(B.3) 


(B.4) 


x“prTo7^ 


The  integrai  is  the  Gamma  function  I*(z)  as  defined  in  equation  (B.2),  where  z * a + 1,  as  defined 
in  the  above  change  of  variables.  Hence: 


rHa-h  i) 

PHa) 

Using  the  recurrence  of  formula  (B.3): 

-r_ otr(a)  __  a 

= m«) = p 
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So,  the  mean  value  for  the-  • ' arcana,  distribution  is: 


3C-SE. 


(B.5) 


3i:-na..Diaribimon 


(B.7) 


The  variance  V can  be  exj:  •: : jsed  as: 

V-eM-ST2  (B.6) 

as  found  in  Section  A-2,  eq'.-adbn  (A. 3),  so  Eft.2]  has  tc  be  computed. 

eM  = / o dX  = J'  -£lx“ * ' e^ciX  = 

u F{a.) 

_ l f(PMg  M .-pxdfBX) 

Ha)*®  13  *'  p”  (B.7) 

Now  changing  variables  in;  i :1c  the  integral: 
pA.=  t 

a + 1 ~ Z — 1 3»  j;  PC  + 2 
Equation  (B.7)  can  be  rewritten  as: 

eM=  1 X:!'V‘dt 

I rial 

Now.  the  integral  is  the  G:  turn  function  Hz)  as  defined  in  ecuadon  (B.2),  where  z = a + 2 as 
defined  by  the  change  of  va:  i:  hies.  Therefore: 


41= 


rta  + 2) 

P2r(a) 


Using  the  recurrence  fennel  a (B3): 


eIx2]  = + T I. * ^ _ (a+  l)aT(a)  __  a( a + H 

*■  ' •>  ~ " -i  i 


P2no; 


p.2r{o> 


(B.8) 


Inserting  equation  (B.8)  into  ^nation  (B.6),  the  variance  :s  obtained: 

V=  e[xz]  - 1 2 * Si.  1 111.  ill  = ct2+a-a2  _ a_ 

,2  2 1 
ii  p-  p-  p- 


Hence,  die  variance  for  the  Gamma  distribution  is  given  by 


for  x > 0 


(C.4) 


CA:  Menn  r>f  thr,  lomorrnal  Disrrihnrion 

the  lognormal  distribution  is  computed  as  the  expected  value  of  the  random  variable  x. 


2W=l0xfx(x)dx. 

■i ./  (log  x - li  )2 ) 

\ 2C>  I 

Changing  variables: 


/; 


x = ey 

dx  = ey  dv 
x:  0 


v: 


Then. 


x **  f 


'■  ~pMy-u.)  Wr 


f2xo 


la 


= J 

Y2jca  J~“ 


exp;-y  "2liy  + J1  ^ y } dy 

2c 


2C 


dy 


(05) 


To  solve  the  iniegrai,  jt  is  necessary  to  add  and  subtract  constant  terms  inside  the  exponent**  so 
that  it  can  be  expressed  as  (y-b)2. 

The  term  2p.o£  + o*  has  to  be  added  and  subtracted  in  equation  (C_5)  for  this  purpose,  so  now: 


V2^c 


/2,exp  /-  y*' 2y  G*  +<J2)  + (u  + c2)  -2tic2-c4  \ y m 


\ 


2a 


I 

Y2tcc 


f"  / fy-(u+  a2)  a2\ 

.L«p|.i  ^ +>+z.j 


( 


dv 
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(C.6) 


i,  ( r i2  \ 

ti-*-a/2  ! , 21 

e f t y-{n+CTj 

■■  - j c \ \ - • --- — — — — t 

Y2jco  }~~  ' o 2 I 


dy 


Maying  another  change  ::  variables  inside  the  integral  in  t‘C.6) 


v - (n+  a I 
1 2a 
, dY 

du  * =t  d . ~ ’*  2odu 


Then, 


ii  + <j  it 

x - _ 12:: 

Y2jcc 

Any  table  of  integrals  ei , 


•U 

C- 


du 


the  following  result: 


du*  Vt: 

So  finally: 


C.2;  Variance  of  the  Lou  - :mil  Dkrrihmiqn 

The  variance  can  be  calc:; ; tad  as  shown  in  Equation  (A  .8) 

V * e[x2]  -x  2 

x was  already  calculated,  only  Efx2]  is  needed. 


Making  the  following  sub:  t:ti.-tion  inside  the  integral: 

x = ey 
dx  = eydy 

x:  0 — » » *p  v:  .. 

¥ 


(C7) 


(C8) 


-V  exo 


Them 


’ V liza 


»— C«p/-y2'2liy+ 


V21CC  j~  "ry  2q2  — 

■ — * r e / y2*2jiy  + u2-4c2y  \ , 

72^—exp^  jdys 


2 \ 

y + M.  t l . 

„ + 2y  ^ dy  ^ 

2 d4 


* -=L-  f exp  / • L>2v^^)Tg2  \ 
fHa  ■ - e p \ ; / 


2o" 


aiogaliy  as  before,  addine  and 
l be  written  as: 


/* 


(C9) 

subtracting  (4n.c*  + 4a4)  inside  the  exponential,  equaaomC9 


( 

c[-2]  - 1 f exp  / . y2-2y(4+2a2)  + (M.+  2a2)  -4ua2-4 a 

T 2kg  9 | 5 

{ 2a 


_ l (~  I |y-(m-2a2) 

-7^aj-expY'  _2 


2o‘ 


/ 


f 24  + 2c2  I 

\ 

•’“"T  j fr-Of^f  \ 

«*  2c2  / 


dy' 


dy 


inging  variables  inside  the  integral  in  (C.10): 

asy-(d+2o2) 

V2o 

dv  . ^ 

t » =>  dy  = V 2 G du 


(C.10) 
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E[x> 


Y2jio 


v 2 o'  I 


e du 


The  integral  is  the  same  as  the  previous  section: 
e u du  = 'fn 

So  now: 

E [x2]  = t:aira'' 

Inserting  equations  (Cl  1)  : r-2  (C-7)  Into  equation  (C8): 

\/  r?  f 2]  — Ini  . C)  *1+0/2 

V = E [x  j - x = e - je  j = 

» 2 2f  I \ 

2tn-2o  ^2u+cr  2it  <•«  a 

e - e s:  H!  \e  - 1 / 

So  finally: 

V = e2^°V-J 


(C.I1) 


(C.12) 
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C.3:  Error  Fnctnr  for  rhe  Losnormni  Distribution 

The  Error  Factor  <EF)  is  defined  as  the  ratio  between  die  95th  percentiie  and  the  median  or  50th 
' percenme. 


EF« 


*.9S 

*50 


(C.13) 


The  error  factor  is  a measure  of  variation  about  a central  tendency  and  is  used  more  often  that  the 
vanancs. 

A relation  can  be  established  between  the  error  factor  EF  and  the  variance  c of  the  flgwnnTM 
normal  variable  as  follows: 


Taking  iog  in  both  sides  of  equation  (C.13): 


but 


log  EF  » log  = log  x 9J  - tog  X 3n 
log  x = y,  then: 


log  EF  = y.95  - y.50 


(C.14) 


^ri?5nS^Liabli?g  X 15  n0mially  distrifauted'  a chanS=  of  variables  can  be  made  to  the 


Z*^  ^ y * u + CZ 

<T 

Replacing  into  equation  (C.14): 

log  EF  = ji  tffz9J.(g  + ffZj0)s 

s 0 (2.95  “ Z 

But  the  95:h  percentiie  of  the  standard  normal  distribution  is  approximately  1.645  and  the  50th 
percentile  is  zero.  Then: 

logEF*  a 1.645 
So  finally: 


a 


lagEF 

1.645 


(C.15) 


O^GHMAL  PAGE  IS 

GE  ^OOR  QUALITY 


Reference: 


.[I]  A.  Mood,  F.  Gni-'  ml,  D.  8oes,  introduction  in  the  Theory  of  Statistics,  3rd  ed.,  1974, 

[2]  H.  Martz,  R.  Walh.%  Bayesian  Reliability  Analysis.  ffl^ea-SWriu  AW-fAQ 
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A.1:  DEMAND  RELATED  FAILURES 


* £ S??0  po^4uiafon,0/  proponents  is  selected  to  perform  a plant  or  vehicle-specific  data 

unrhTn^hf  re  atfd  failuTcs*  a generoiiy  used  method  is  to  count  the  number  of  failures, 

P°Pulat2°n'  occurred  as  a resuit  of  a number  N of  total  demand  trials. 
Assmnmg  that  the  failure-on-demand  probability  p for  each  component  is  constant  at  ft?rh  demand, 
and  denning  a random  variable  X,  representing  the  outcome  of  the  **  demand  trial*  that  is  failure  or 
success  (mrmnncai  values  i or  0 respectively),  then  X,  follows  the  Bernoulli  distribution,  that  is: 


P (Xj*x) « p * (1-p) 1 ’* 


After  //  demand  trials,  a random  sample  X{,  X2 X„is  ob»rin«L  The  objective  is  to  estimate 

l™px  y 1 Sinsie  Vid“  (p0,m  esri"”*>  usme  the  inferrn^™ 

5 u“d  “ accomPiish  desired  result  The  method  of  maximum 

nror«.T^»ndhf~  tS%?Sen  amonS  *e  °l!?er  roethods  because  it  gives  estimators  with  desirable 
properties  and  are  rairiy  easy  to  obtain.  The  yioeedure  of  this  method  follows  (I]: 

Let  X[,  X2,  ....  Xm  be  the  random  sample  obtained  from  n demand  trial  observations.  Each  of 

there  n random  variables  has  a density  function  (probability  distribution)/#,-;  Q),  where  $ is  the 
unknown  parameter  to  esnmntn.  Then  the  joint  density  function  for  the  random  sample  is: 

f(X,,  X2, ....  Xn:  6)  = t(Xj;  9)  f(X2;  9)  f (X»;  9) 

since  X(,X2»  ^pXn  die  mutually  independent 


After  the  sample  is  obtained,  / is  a function  of  0 only.  This  function  is  called  the 
function  and  is  indicated  by  L(  ft).  The  method  of  maximum  likelihood  consists  of  finding  the 

value  6 of  ft  which  maxima  the  likelihood  function.  Such  ft  is  called  the  maximum  likelihood 
estimator  of  ft. 


Returning  to  the  original  random  sample  Xlt  X2> 
distributed*  die  likelihood  function  is: 


....  Xftt  each  variable  being  Bernoulli 


N 


L(P)  -HP  '(1-P) 


1**i 


p*V 


P) 


M-Et. 


jfel 


%J  c&wal  pmi & 
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where  *-*  *t  represents  Inn  sum  of  X,  from  i—1  to  i**/V  sad  is  therefore  the  total  number  of  failures 
observed. 

-MaYimiririgL  is  equrvakt?  to  nminrizaug  its  logarithm.  i.t:  Lag(L),  then: 


Log  (L)  s X Xj  ’ Log  (p)  - (N-Z  Xi)  * Log  (1  -p) 

A byproduct  of  this  is  th.r.  the  mathematics  is  sicipiifi  : i. 

To  obtain  the  maximum  ?:  :Log(L),  the:  derivative  with  re  spect  to  p is  ecy,ww<  to  zero: 

SLog(L)  _ X|  ^ 

<5p  P l'P 

And  solving  for  p,  the  tr; ■ imum  likelihood  estimator  ts  obtained: 


P- 


2* 
N " 


f 

>r 


r 

■t 


That  is,  the  point  esmn?(< • far  p is  obtained  dividing  ’ho  total  nttmhisr  of  failures  observed,  by  the 
total  number  of  demand  i a h N. 

It  is  noted  that  if  N is  fi:*; ihen  p is  a random  variable,  suet  being  the  sum  of  Bernoulli  variables, 
it  is  distributed  aercrdta. ; • a the  Binomial  with  patartr;  srs  p and  N. 


Now  a numerical  intern.)  is  desired  to  bound  the  unknown  parameter  p with  a certain  degree  of 

confidence.  The  idea  is  : : Lad  two  functions,  L{.)  ar.d  &(.},  of  the  random  sample  Xj,  Xi 

X„  so  that,  prior  to  obser  • i tg  the  sample,  there  is  a cerrin  known  probability  that  the  parameter  of 
interest  p is  confined  ir  the  interval  L(  That  is.. 


P [ L(.)  S p £ Ut  '!  1 » 1 - y 0<7<1 

After  the  sample  is  obser:  <'  x:l.  the  funcrions  U.)  and  U yield  two  numbers  / and  u that  constitute 
the  confidence  interval  o:f ; wsi  (l-y). 

Some  confusion  arises  n:  jnrding  the  interpretation  of  rliis  confidence  intervaL  Its  intupr^oa  is 
as  follows:  Prior  to  obtai  ting  the  sample,  it  can  be  sra .? . that  the  probability  that  the  obtained 

will  yield  an  interval  (I.u.,  : onraining  the  unknown  parameter  p is  ( by).  Note  that  after  the  ?nmr,ie 
is  obtained,  the  values  l «.  v.\  t.i  are  fixed  and  the  interval  (i,u)  either  contains  the  point  p or  not  (i.e. 
probability  1 or  0).  Anon  rr  way  of  looking  at  this  v/c;  id  be  to  obtain  several  random  samples X[, 

X 2.  ....  Xn  and  genera n.  one  confidence  interval  cl'  level  U~y)  for  each  sample.  Then,  it  is 

expected,  on  the  average:  a:  a i in  the  long  run  of  sampling,  dim:  10Q( l~y)%  of  the  obtained  intervals 
will  contain  die  actual  vn!  sjr  of  p. 


Several  mohods  are  available  to  establish  confidence  limits . The  so  called  "statistical  method”  fll 
is  used  for  mis  case  in  which  the  probability  distribution  of  the  estimator  is  known.  The  mrTKr»H 
consists  of  finding  two  functions  L(.)  and  U(.),  functions  of  the  random  sample.  L and  U are 
found  by  solving  for  6 the  following  equations: 


j fTfeOJdt-p! 

■*  - Solution  8 m U(.) 

f f*i(t;  8)  dt  = pB 

T 

•+  - Solution  8 = L(.) 

where  T is  the  K^mr,  function  of  the  random  sample.  If  the  variable  is  discrete  (i.e.  it  can  oniv 
07615  vaiues)'  integrals  in  the  above  equations  wouid  need  to  be  replaced  by 


Pi  and  P a can  be  arbitrarily  chosen,  though  they  are  usually  chosen  to  get  certain  desirable 
properties,  for  example  they  may  be  selected  so  that  the  resulting  interval  (l,  u)  has  minimum 

length.  For  the  present  simanon  they  wiil  be  selected  so  that  Pi  = PH  « yt2  for  a (1-Y)i00% 
confidence  interval,  that  is  "equal  tat!*",  If 


Going  back  to  the  paint  estimator 

« 2Xj  f 

P N ~N 

an  upper  bound  for  p (p^  can  be  obtained  solving  the  following  equation  for  p, 

2 

ssO 


u* 


pi  O-P a)N-=% 


^ cou^  be  used  to  find  pu  given  N and  f,  but  it  is  more  convenient  to  a 

transformation  into  a continuous  variable  F-distribt«*d  using  the  following  relations: 


a)  The  Binomial  is  related  to  the  Incomplete  Beta  Function  as  follows: 
X (s)p*  0-P)n*S*Ir(a,n-a+l) 


[Ref.  2,  p.  945,  25.5.24] 


b)  The  Incompk:;.':  3 «a  Function  ie>st«s  to  die  ’•-distribution  as  follows: 


Qi' / v . , Vj)  = Prob[/'S  F]  » Is  ~~ , y) 
with  x = ■■  Vi--- 

v,+  v-F  [Ref.  2.  p.  945.  26.5.2S] 


where  F is  a random  va 
being  the  cnmrbrfvc  pro!  ; ; 

: thle  F-disoributed  with  v?  - nri  v-?  degrees  of  freedom,  and  Q-l-P,  P 

a!  ril;ty. 

Merging  the  above  relatit; ; 

into  one: 

*■* 

■ Qiy  * r/v,.Vj)*  i P(~p-~— - v,,v^ 

P P N—  it+1 

with  v,*  2(N-a+! 

and  v5=  2a 

R&manging  and  using  thi; 

; ; c;:monship,  the  eq«««on  becomes: 

s*0'  1 

pi  a-p^-fc 

i»tti 

£{^Pa  (H>J 

t~Mx  1 

,,-Q(Y/v1.v2)=l.y2 

where 

:»2N-2f  and  v,=  2f+2 

P.  N-f 


All  that  is  needed  is  a vs;  i n Y from  the  F-distributici  * { v>  and  vz  degrees  of  freedom)  whose 
cnmniarrve  probability  is  y 3 . that  is: 

Ys/V(2N-2f;2f+:;:  ; . J±JL. 

" Pu  K— f 


Now,  solving  the  equation  for  p* 


(t+f)  + (N-f)  /^2N-2f ; 2f+2) 

Mfllring  use  of  the  known  reciprocal  relation  for  the  F-distribution: 

^Cvt?v0*  1 

So  the  final  value  for  pu  is: 

p - (1  + f)  F\.,p.t+2 ; 2N-2f) 

(N  - f)  + (1  + 0 f \.j^2f+2  : 2N-2f) 

“ mom * f<*  *• l0WCT  bound  ofpfpii.  In  dris  case  die  Mowing 
Agnin  using  the  tn*i«fon™Hon  relations,  the  following  equation  is  obtained: 

Q(Y/V  J,  » i - P(Y  fvt,  Vj^asl^ 

where  : vt-  2N-2f+2  and  vI=  2f 

Solving  for  pj: 


Pi  = 


f + (N-f+1)  F1.T^2N-2f+2 ; 2f) 

Or  equivalently,  using  the  reciprocal  re)™™,  for  the  F-distribution: 

F^2f ; 2N-2f+2) 


Pi' 


(N-f+1)  + f T,fc2t  \ 2N-2f+2) 


X} ) 


Lei's  summarize  the  result;  brained  for  the  demand  relate d failures: 


f fJ2£.:lX-2 f+2;) 

Pt=  

(N-f+I ) + f > :,pf  * 2N-2.fn-2) 

(1  + f)  T ,(2(+2  ; 2N-2f) 

Pa“  — 

(N-  0 + (1  + f)  :Fufif+2 : 2N*2f) 

where: 

p = re-  on-demand  probability  prim  estimate 
f = ntn;  i her  of  demand  related  failures 
N = my.  :s  i her  of  demands  over  which  te  f failures  occurred 

pi  — ft.  i irr.  on-demand  probability  lc.  mx  confidence  bound 

Pu  -=  f*i  i >re:  on-demaiid  probabiiiiy  lq/par  confidence  bow"d 

F?(yx,  V2>  = ptfl  :nxnriie  of  an  F distribudos  with  v:t  and  V2  degrees  of  freedom 


A.2:  TIME  RELATED  FAILURES 

Forthe  ease  of  time  relaxed  failures,  a com.ivm  practice  to  perfonu  a plant  or  vehicle -specific  riara 
® count  the  ptimher  of  fail»>res,  for  a cenaxn  population  of  size  n,  that  occurred  during  a 
cena-m  fixed  period  of  tune.  If  failed  items  are  replaced  or  repaired  after  failure,  and 

assuming  that  failures  occur  independently  and  at  a constant  rare  in  rime  across  different  then 
for  any  given  item  (and  us  corresponding  replacements  if  it  fails)  a Poisson  process  is  gener*t?d 
with  parameter  Ar,  X being  the  constant  faiim1©  rare  anrf  t the  rime  length.  ■ 

Defining  a ranrirmi  vnnnhie  X{  representing  the  r,M,,*'ex  pf  failures  for  the  i*  item,  -XTloUows  the 
poisson  distribution,  thar  is,  v 


P(Xi-x)=(Xf  e*Xt 
xl 


i 


<>* 


rS 


With  the  number  of  feilures  for  each  of  then  items,  a random  sample  X},  X2,  ....  Xn  is  obtained. 

i fr°m  fhe  information  uiuvided  bv  the  sample,  the  method  of  mmnmrm 
Ukelihooa  is  used.  The  likelihood  function  is: 


» a rt  *(  -nXt— 

lo.) - n f(x^) - e a 0 

|a|  i"l  X‘  rix^i 

Now  finding  the  maximum  for  LOC(L): 


Log  (L)  * -n  X t + jr  x j log  (X  t)  - £ ?°8  (xj!) 

i*i  i»l 


5 Log(L) 

*-nt+  =0 

6X  X 

n t *j*  *j» 

So  the  point  estimator  for  A.  is  obtained  dividing  the  total  number  of  failures  bv  the  total  time- 
exposure.  i.e.  n t = T. 

It  is  noted  that  A is  also  a random  variable,  and  being  the  sum  of  independent  poisson  processes 
and  assuming  n and  T fixed  and  known,  A is  Poisson  distributed  also. 
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The  Goiicsponding  corn:  evince  interval  can  be  found  c a.  similar  fashion  as  for  p,  the  failure-on- 
demanri  prooabiiity. 


Far  the  upper  bound  ic  -t  following  equadon  has  to  ' a solved; 


* . IT 

£ U,T)  e 

wdt  >1 


Using  the  following  ick:ii:  t?  between  the  Poisson  and  "he  Chi-Squared  distribution  with  n degrees 
of  freedom: 


i 7 7 c-l  -m  j 

Qa7»>-  Probf  lizJ-TUSL 


wich  c = ^-  .vein®  ra«4- 

2 7 


Incorporadng  this  reLado  ■ i u>  the:  original  equadon; 


Q(2XuT/v=2f  4 r „ 1 - P(2  XUT / v»2  f + 2)  *•  V 


p<2XttT/v»2  f + ?>«r  i ~y 


Now  solving  for  A«: 


For  the  lower  bound  A/,  tl: iquadon  to  solve  is: 


e 


•«  i --T 


fci  a,T)Vr 


Using  the  xcMoq  to  the  Cl  u squared: 


Q(2XtT/v»2f)*=  I P(2 X.Tf vs»2f ) « i - ’% 


P(211T/v2f)a  {.: 

J V 


[Ref.  2,  p.941,  26.4.21] 


w 


A-S 


Solving  for  A< : 


x_X^(2f) 


T 


2T 


Let's  siit  the  results  obtained  for  the  time  relmeH  failures: 

£ 

T 


l.i 


K= 


X \.y2  (2  f 2) 


2T 


where: 


2T 


X m failim*  rare  point  gstimnre 

f = of  rime  failures 

T * rim#,  interval  over  which  the  f fiHltirgs  occurred 
totems) 

X|  = failure  rate  lower  confidence  bcnnrf 

2^  *=  failure  raze  upper  confidence  Hrwtnrj 
2 

X p — ptfx  pci cciialc  of  an  Chi-squared  distzxbuziOQ 


(multiplied  by  nnmhcT 
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AJ:  CARP  ••  A Computer  Tool 


CARP  (Computerized  A • £ lysis  of  Reliability  Paramc  ;rs<  is  a computer  code  for  manipulating 
failure  data.  It  was  desig' ; i by  SAIC  and  has  evolved  over  several  years  in  support  of  reliability 
data  analysis  projects.  Th: : mam  features  of  CARP  incirdte:  • 

1 . Tolerance  :r  pisgarinn  of  data 

2.  Determined  on  of  plant-specific  failure  rates  (point  estimares  for  maTtmurn 
liMfhood  •!  rtmators) 

3.  Cakniiarinr  c if  confidence  intervals  for  rime  and  demand  relatw*  failures 

4.  Bayesian  njuiadng  using  conjugates 

5 . Automatic  • -eras  to  a generic  database 


Features  2 and  3 are  pen:  rrieri  following  the  methodology  explained  in  detail  in  the  previous 
sections.  The  oniy  thing  th  a:  is  worth  mentioning  is  tfaav.  whenever  the  number  of  failures  observed 
is  zero.  CARP  uses  numb;  " ')!  failures  f»0.33  to  obtain  the  point  estimates. 

Features  4 and  5 were  not  t :■  for  this  particular  study,  :nd  will  not  be  here. 


Feature  1 was  extensively  . ; id  in  this;  report  and  deserves  special  awwminn 
TOLERANCE  AGGREGATION 

An  aggregation  method  is  i: ::: : ided  to  combine  multiple  da  ta  sources  into  a single  eshmam.  CARP  is 
able  to  perfwt.uj  this  aggreijir  ion  into  a composite  esritev  k;  using  a technique  which  preserves  the 
tolerance  of  the  individual  data  sources.  This  aggregs  dan  ^chmque  consists  of  three  steps  as 
follows: 


Sl£P  1;,  Ht  Individual  ItenAamas. 

Each  individual  daw  souren  (or  a given  component  type  ttnd.  faEuxe  mode  is  fitted  to  a log-norm*} 
distribution  described  by  \i%  median  and  logarithmic  standard  deviation.  The  iog-normni 
distribution  is  used  because:  :ir  is  easy  to  deai  with  comnr;  -ationaily  and  is  well  suited  to  expre^mg 
uncertainty  bounds  (via  ent:  r or  range  factors). 

Individual  data  sources  pirn  irie  statistical  inform*  rion  ir?  a variety  of  styles  which  form  two  broad 
classes: 

( 1 ) sources  that  provide:  i : sinibutional  inxonr^nnn 

(2)  sources  which  pi«vii &;■  failure  counts  and  exposures 

The  methods  used  to  fon::  he  log-normal  uncertainrr  distribution  depend  upon  the  type  of 
infuiujanon  provided. 

Distributional  Infouuanon:  Hone,  distributional  information  is  specified  (e.g.,  mean  value,  point 

estimate,  upper  and  lower  ; ::oxntile$.  etc.).  Such  information  may  be  difficult  to  assess  since 
sometimes  generic  data  sc  : tens  do  not  provide  adequate  infarmanon  to  interpret  the  supplied 


/Foreranjjle,  ^ die  supplied  values  consider  both  dara  confidence  and  tolerance?  Is  the 
point  value  a distribution  mean,  m"di*«  or  mode?  What  digtritnjtinpni  type  is  used?) 

• F^lnn-  Count  and  Exposure:  This  style  provides  the  total  ptimper  of  failures  that  have  occurred 
ovre  a specified  nine  period  or  number  of  demands  (or,  alternatively,  cycles  or  trials).  There  are 
three  issues  of  concern  in  using  this  style  of  infc»‘"*riO!i: 

a.  It  is  not  possible  to  ascertain  whether  or  not  the  information  is  consistent  with  an 
assn<v.nUon  of  constant  faihire  rates  *"d  constant  ft»t»ire-on-dennnd  probaHHri**:  «s 
the  »rn«  (or  demanric)  between  failures  is  given. 

b.  Gaxt:ac  riar^  sources  typically  do  not  s*™*  if  the  dam  has  been  smr»*ticaily 

if  me  last  failure  occurred  exactly  at  the  end  of  the  exposure  period,  then  the  d»t*  is 
uncensarea.  if  failures  were  counted  until  a preset  total  fitflitre  count  was 
men  the  data  is  Type  1 censored.  If  failures  were  counted  for  a preset  tim*  period. 

i?  “e  “ra  ls  Type  II  censored.  Knowledge  of  the  censoring  scheme  used  to 
collect  the  daia  is  necessary  to  provide  meaningful  uncertainty  esnma»s. 

c.  Only  failure  and  exposure  totals  may  be  given  even  though  the  accompanying 
- explanatory  text  of  a generic  source  may  s»am  thar  the  pop»iadon  is 

Heterogeneous.  In  such  cases,  the  information  appears  to  have  a high  information 
moment  (due  to  the  large  number  of  failures;;  however,  there  is  no  way  to  separate 
the  confidence  from  the  data  tolerance. 

5?  i h*.H  ?•  Form  Airpreffate  Disiribnnqj] 

**?  5?  “™bined  a*  a single  esdma«.  by  forming  die  weighted  sum  of  each  input 

Haw  soiree's  distnbuaon  fimedon:  ^ 


P[Xixj*  X^iPCXiSx] 

i-i 

where: 

N * n*wnbor  of  generic  data  sources 

P { X £ X J “ Hiattibunon  function  Of  the  aggregate  reliability 
parameter 

®i m weight  of  the  i*  generic  da?*  source 

^ 1 * distribution  function  of  the  genetic  data  source 

developed  by  SAIC  for  EPRI  during  the  Component  Reliability 
Paramerer  Studies  and  based  on  the  work  of  Stone  [3]  and  Winkler  (4],  ensures  that  data  tolerance 
is  preserved.  By  smearing  the  uncertainty  of  all  input  generic  data  sources,  an  aggregate 
uncertainty  bound  is  created  which  properly  encompasses  the  entire  range  of  uncertainty.  Each 
input  generic  da  ta  is  assumed  to  be  log-normally  distributed. 
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It  can  be  shown  tint,  rnj  miless  of  the  input  data  source  di  stributionai  type(s),  the  mean  of  the 
aggregate  distribution  h ’-he  weighted  sum  of  the  in-;  m mims.,  Det* 1 ™ of  the  aggregate 
distribution  percentiles  y phally  requires  a numeric?’  solution.  Using  the  previous  assumptions 
(i.e.,  log-normniiy  disoiorted  input  data  sources  a.rd  equal  weights),  the  following  equation 


dt 


where: 

1,  = median  of  the  i*  inpti  c la  ta  source  * 

= logarithmic  st??«?9.nj.  dr-nation  of  the  i*  input  date  source 

This  iatter  equation  is  s:  i fed  (i.e.,  the  value  of  xp  de^snaiited  for  a given  value  of  p)  for  the  5th 
percemiie  (p=Q.Q5),  the  ™ udian  <p=0.5G),  and  the  95 tb  ’percentile  (p^). 95).  These  bounds  are 
subsequently  converted  i. no  a log-normal  distribution. 

lit a*  3:  RtAggrr. m Pianfeaacn 

To  fariHraTw  use  of  the  ay  i;:rega*ed  distributions  in  nadmonai  PRA  uncertainty  calcnianons,  each  is 
conveiuai  to  a iog-noiui.fi  distribution. 


. [1] 
[2] 
[31 

[4] 
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FOREWORD 


CARP  is  an  computer  cum:  for mtiDipuiating  failure  nsxft*  It  was  designed  and  has  evolved  over 
several  years  in  suppmi  :*  our  reliability  data  analysis  projects.  SAIC  will  not  be  responsible 
for  any  problems  resoling  from  the  use  of  CAR."  nor  will  it  ensure  that  CARP  users  are 
supported.  All  reason,-  ■ ■;  requests  for  help  wiE  be  honored  in  the  interest  of  further  refimn^ 
CARP,  it’s  algorithms,  .r  data  analysis  in  gentsrsiL 


ORIGINAL  PASS  IS 
OF  POOR  QUALITY 


INTRODUCTION 


CARP  was  developed  by  SAIC  for  the  analysis  of  failure  dj»m  during  PRAs  and  other  reliability 
studies.  In  the  current  version,  CARP  can  aggregate  up  to  20  generic  data  sources,  assuming  that 
all  input  sources  are  log-normal  or  Poisson  data  sets  and  that  the  desired  output  is  loe-normal 
Additional  features  include; 

1.  Determination  of  plant-specific  failure  rates,  given  Poisson  data  sets  f''nnm>»raTnr. 
denominator1'  infui  Mistion),  including  uncertainty  estimate  using  y*  or  F- 
distribution  bounds; 

2.  Bayesian  updating  using  conjugates. 

3.  Automatically  access  a generic  data  base. 

CARP  is  written  in  the  dBASE  HI  Plus  programming  language,  and  was  compiled  using  Clipper" 
(Siimnvr  ‘87  Version). 

CARP  REQUIREMENTS 

1*  IBM  PC,  XT,  or  AT  (or  100%  compatible) 

2.  PC-DOS  or  MS-DOS,  version  2.0  or  hi£b«\ 

3.  Must  have  384K  of  RAM  available  use  the  DOS  -and  (CHKDSK  to  find  out 
if  you  have  enough. 

4.  The  CONFIG.SYS  file  should  have  the  following  (as  a minimum) 

FTT.FS  * 20 

buffers » 8 

5.  CARP  must  be  able  to  find  COMMAND.COM.  Use  SET  COMSPEC  if  necessary 
to  identify  the  correct  path  and  drive. 

HP  LaserJet  printer;  CARP  will  piobabiy  not  work  as  well  on  other  printers  since 
as  it  sends  ESC  sequences  to  select  fonts.  However,  CARP  will  provide  plam  text 
file  output  drat  can  be  sent  to  any  printer. 

7.  If  the  generic  data  base  is  to  be  used  with  CARP,  a hard  disk  is  requited. 
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INSTALLATION 


Six  files  coinpiiie  tfe : '!  A.RP  software: 

1.  CARFAffE,  the  machine  executath  iras-ge; 

2.  CARP  DBF  and  C.ARP2-DBF  tot  : okr.Gr;  to  create  daw  files  for  CARP. 

3.  SAIGA  UP.  ±e  generic  data  hast. 

4.  CQMrmTXTX,  CODES JTOC  and  CTTJ.NTX,  indexes  for  SAICDBF  to 


CARP  OPERATIC)?' 

CARP  is  mean-driver  ;md  intended  to  be  easy  to  use..  Of  course,  software  is  never  ready  user- 
ftiendly;  here  are  sort:!:  snides  through  CARP.  CARP  uses  3 b*«c  intiOaces  to  allow  your 
control  from  the  keyiir:  ird.  All  depend  heavily  c.n  the  cursor.  The  position  of  the  cursor  is 
identifiable  on  each  suissn  by  a change  in  color  and  or  intensity.  Table  1 list  the  various  keys 
that  can  be  used  to  mr  c:  the  cursor. 

The  first  and  most  fii? : arily  type  of  interior’*  is  the  cr,  reliable  menu.  You  will  be  presented  with 
several  choices  and  sn:::  :nmes  amplifying  informat  ion.  There  are  many  ways  to  mnirn  a choice. 
the  simplest  being  to  s tiths  the  first  letter  of  desired  selection.  In  *«««n  with  no  mpmuwi  first 
letters,  this  completes  i process.  In  menus  with  Treated  first  letters,  you  must  follow  with  an 
<Enter>  keystroke.  Y y.i  can  also  scroll  through  these  menus  using  the  aiiuws  or  a variety  of 
orf"-*-  keys  described  i : labile  I. 

The  second  b:srface  is  used  to  answer  simple  questions  throughout  the  code.  If  the 

answer  to  the  question  i yes,  siropiy  strike  the  ‘YT  key  or  if  the  answer  is  no,  the  ‘N’  key. 

The  third,  iniwface  retires  the  input  of  data.  The  ?reo.  in  which  to  type  the  data  is  ipdir*rm-i  by 
a rhangg  ^ backgrour:  odor.  CARP  has  numsroc;  ohetrks  and  balances  to  ensure  that  the  data 
is  entered  correctly,  0 -K. , characters  are  not  scce-i  :ed.  in  numeric  fields)  but  it  cannot  prevent 
alt  mistakes. 

Sqeen  1 

Fiuiu  DOS,  type  CAPA  so  ;;tan  execution  of  CARP.  It  takes  some  time  to  load  CARP  into 
memory,  so  be  patients  any  key  to  leave  the  v.-eicome  screen,  once  it  is  displayed. 

GVRP  also  has  a quick  entry  mode.  Type  CARP  cfilenam c>  and  if  the  file  exists  you  will  be 
in  the  main  menu  (Scni: : 3).  If  CARP  does  not  find  the  file,  you  will  be  if  you  want  to 
create  this  file  (Screen  A h).  W 


Screen  2 


After  the  welcome  screes,  you  will  be  a?k*d  to  choose  one  of  hie  following: 

a.  Return  to  DOS; 

b.  Enter  a oroTect  You  will  be  provided  with  space  to  enter  or  edit  a 

diive/parh  spec  and  a hie  spec  for  either  a new  project  or  an  protect. 

Note  that  these  specs  ate  in  DOS  form**,  e.g.  r 

dnve/path  d:\PRA\DATA\ 

PRA_CR3  Extension  is  not  necessary; 

CARP  will  add  .DBF 

Note:  This  could  be^probiem  if  there  are  other  dBase  files  in  the  drive/path  you  entered. 

If  *e  drive/path/file  spec  you  entered  already  exists,  CARP  will  load  it.  Otherwise  you 
will  be  asked  to  verify  that  you  want  to  create  a new  file  with  the  spec  you  entered. 

c.  Continue  work  on  an  existing  orofeers  Upon  selecting  this  option,  yon  will  be 
jw>yidcd  with  a scrollable  of  CARP  projccs.  Mofc  the  cursor  to  the  desired 

project  anrf  select  using  the  key. 

Screen  3 

00W  “ s m:rin  » shown  in  Hgure  1.  Fi^ui  the  menu,  you  anr**s 

ail  die  fr«*»wes  of  CARP  * ' 


Hie  first  two  choices,  append  and  edit,  allow  you  to  add  a record  and  edit  an 
existing  record  respectively.  The  edit  op  nan  will  provide  a scrollable  list  of 
cotuponems  that  have  already  been  extended  into  the  rf^ase.  The  append  option 
will  require  that  you  enter  component  type  and  failure  ^ codes. 

Notre  RfiCwis  in  CARP  are  keyed  to  the  component  type  code;  that  is.  only  one 
record  may  be  entered  far  earn  type  mrfu, 

FitfiOT  option  will  send  you  to  Screen  4. 

The  delete  option  gives  you  the  capability  to  delete  selected  component 
typerfailure  mode  combinations  from  the  analysis.  Information  deleted  in  this  step 
is  not  recoverable.  r 

The  generate  report  option  sends  yon  to  CARPs  report  writer  (Screen  6) 

The  bad  generic  dntn  option  sends  you  to  the  generic  rfaw  base  - Screen  7. 

The  return  to  project  selection  menu  returns  you  to  Screen  2,  from  which  you  can 


i 


staix:  :i  new  project  or  return  to  DOS.  All  analysis  is  saved  (or  deleted)  at  this 
poini: 


Screen  4 


CARP’s  Hara  ftnftiy  ;;;  "s  concoiliid  from  this  sc;  sen,  referred  to  as  the  ‘General’  screen.  The 
screen  shows  and  aKc'Vs  editing  of  the  compone"i  : name  and  failure  mode,  accepts  pl^ny  specific 
dn«>  and  aiinivs  tfai  ji  jjr  to  control  some  of  tfas:  statistical  analyses.  To  shift  CARP  into  the 
editing  mode,  seleci  Ds  j:  from  the  green  menu  at  he  bottom  of  the  screen.  Default  choices  have 
been  placed  In  the  .11  r.yesian  Updating  and  Final ' lecrina  of  the  screen.  The  Bayesian  Updating 
section  requires  a si:: ruv  Y or  N (Yes  or  No)  for  rAcAer  a Bayesian  Uprfam  5 calculation  should 
be  perluiinecL  The  oices  for  the  final  basis,  o;  the  TTiCu^utucaded  final  sm  fistic  ate: 

P • plan?  Specific 
G - Generic 
B - Bayesrn, 

The  preferred  schms):  ’er  fining  values  to  a iognct  mi  distribution  (MN-EF)  preserves  the  central 
tendency  of  the  disci;:,  rdon.  The  opdonal  schesc?  preserves  the  spread  of  the  distribndo- 
These  schemes  ate  ebreribed  in  Appendix  A.  SriHng  she  key  F2  uiuvides  some  help. 

The  gteen  bar  at  tb-.s  bottom  of  the  screen  pres  ides  the  necessary  program  control  options. 
fnm»Hyt  CARP  is  in  she  display  mode  and  Ac  or  inns  inchide  five  numbered  wwi-mg 

Edit  and  Save.  Selcir  cn  of  one  of  Ae  five  mcr^s  you  to  the  screen  containing  those  generic 
Hnt*  sources,  which  n described  under  Screen  X StJtecong  Edit  shifts  CARP  to  the  erfwmg 
mot^  fln-  selection  y]  a numbered  screen  or  the  General  screen  editing  of  rf»*t  screen. 

The  Save  option  sav:.;  hie  data  to  disk  and  return  : you  to  the  main  menu.  Screen  3. 

Once  in  the  edit  mod: Ac  sneen-to-screen . movetrent;  is  as  described  above.  Editing  the  g*mend 
screen  is  described  below.  Two  new  options  now  appear  AG  CNTL  and  CALC  AG  CNTL 
controls  Ae  aggregati  1 meAods  from  Ae  three  A vailable  choices: 


T - Tolerance 

A • Arithrrwti;: 

G - Geomerru: 

»nrt  the  weighting  m::  finds  wiA  choices: 

Equal  weights 

Usta-  supplied  ^rights  (CARP  will  nonnairas  Aese.) 

Vaxiance-reian: :!  weights  (Inversely  proper"  in  rai ) . 

The  CALC  option  bep'iif  Ae  aggregation.  Bayesiaa  updating  and  other  necessary  calculations!! 
Upon  completion  of  ;:t!-  calculation,  you  will  be  1 jniraed  to  the  General  screen  in  Ae  display 
mode. 


I H M 


ill  ii  t 


Screen  5 

There  are  5 screens,  labeled  1 to  4,  5 to  6,  9 to  12,  13  to  16,  and  17  to  20.  These  show  each 
generic  n>r»  source,  and  can  be  assessed  by  first  locating  a desired  screen  with  — » or  <—  keys  then 
-L.  These  screens  operate  identically. 

Fnch  generic  source  looks  hvr* 

D MEAN  LOWER  MPTilAN  UwisK. 

1 

I.FR  l 5.65*07  4.0W7  7.35-07 

5.65-07  4.22*07  5-57*07  7.35-07 

Note:  NUKEG/CR-1740;  DATA  FOR  AT  T.  REACTOR  TYPES 

Note:  Failure  rates  and  parameters  MUST  BE  entered  in  the  former 

e.g.  5.876  x 10**  — ► 5.87  - 03 

In  order  to  edit  the  ‘Note*  field,  you  wins*  move  the  cursor  to  the  area  and  strike  F5.  This  will 
allow  you  to  type  notes  documenting  your  analysis.  Strike  <uikL>  W to  exist  a«H  save  fa 
the  note  field. 

Sqpen6 

The  report  writer  control  screen  provides  a series  of  scrt>n*Me  memi$  to  control  the  report 
options.  The  options  selected  are  conrinuoosiy  displayed.  The  initial  menu  allows  you  to  select 
one  of  the  following: 


EF 

WEIGHT 

1.3 

1.000 

- Options 

• Primp? 

-Go 

• Return 

Options:  Selecting  options  will  lead  you  to  a series  of  choices  on  how  to  configure  your  rep  oil 
The  first  choice  is  between  the  *»•••■•••«* y and  derailed  repuu*  (shown  in  Figure  TBD.)  The 
5...... . .„ry  report  always  covers  all  type  codes,  but  you  will  be  required  to  choose  a scope  (final 

s"»"«rics  oniy  or  all  statistics)  for  your  report.  The  defiled  report  will  allow  you  to  report  on 
all  type  codes  or  a single,  selected  type  code.  Then  it  will  provide  a choice  of  the  scope  of  the 
report.  The  final  choice  is  whether  you  want  text  (Figure  TBD)  and/or  interval  bars  (Figure 
TBD). 

Printer:  The  piiuter  options  allow  you  to  direct  output  to  a Hewlett  P»Hr»rd  (HP)  L^^et-h,  HP 
Laser  Jet,  other  printer  or  a file.  The  LaserJet*  option  requires  that  you  write  a b»w?h  file  to 
download  soft  fonts.  The  HP  T -*cm}et+  and  HP  LaserJet  support  bath  test  »nd  interval  bars 
(graphics).  Other  printers  generally  can  be  configuicd  to  print  out  the  text,  but  will  not  support 


W 


the  interval  bars.  Postsr.- 1 1 printers  are  not  suppcrcren  The  output  can  also  be  directed  to  a file 
You  can  use  your  own  wr.:t.  processor  or  spreadsheet  software  and  printer  to  produce  attractive 
output. 

2ft:  Stans  the  printing  t::r  writing  to  a file.  Caution,  iererral  bars  ntk*  a while. 

Reimu-  you  to  tin ! rain  menu 
S preen  7 

Screens  7,  8 and  9 are  ri:  trjmitd  to  assist  you  in  efSci  mtiy  -finding  generic  data  in  the  large 
dam  base  that  co^m  wiij:  : *A.RP.  Note  that  dBase  ITT  or  SV  can  also  be  used  to  access  the  data 
base.  (SAICdbf)  The  iniini  generic  dam  search  screws  sen  the  strategy  for  accessing  the  data 
Search  strategy  l asks  ye  w art  input  a type  code  and  a silure  mode  code.  The  appropriate  fields 
are  fitted,  in  die  daw  basts  'me  are  not  regularly  updaT-d.  These  fields  are  based  on  an  analysts 
prcfcred  coding  schema  :nd  are  yotir  responsibility  to  maintain  a ri&uiuiis  ending 

scheme  is  established  and  ; udurained.  this  search  strawgy  is:  not  recO‘*...«*vW.  Search  strategy 
2,  the  dnfanlt  strategy  is  rmammeaded.  Selecting  it  is  strategy  brings  up  a scrollable  list  of 
cr.r.i.njnem  types.  Chocw-.  a component  type  and  ycr.  wifi  be  *«J»d  to  choose  fiuui  available 
fafim-g  modes.  You  will  f! . : m either  be  given  the  optio”  of  refining  the  idnntrfiennnn  by  choosing 
Limu.  a selection  of  compwMt  subtypes  or.  in  some  ca-T'Sf.  you  will  be  directly  mm  Screen 
8 for  choosing  subtypes. 

$apen  8 

Screen  8 allows  further  id:  :ii  dftcadott  of  the  component  using;  the  component  type  anribu***:  The 
available  chpfc??  appear  in  the  scrollable  mean  in.  the  box  on  the  left  side  of  the  screen  The 
<pnn-r>  key  is  used  to  mi11'  r:  selections  into  and.  out  n*  the  box  on  the  righi  side  of  the  screen. 
When  the  box  on  the  rigl:  t contains  the  piuper  entries  strike  the  F10  key  to  move  to  the  next 
screen,  screen  9,  which  displays  the  generic  data  sour:  es:. 

Screen  9 

Screen  9 is  designed  to  diim  siy  the  generic  data  source?  that  meet  the  criteria  specified  is  y-w-na 
7 «"H  8.  This  is  the  final  3 : teen  before  loading  the  generic  data  into  the  calculation  portions  of 
CARP  (Screen  5).  Note  flw  using  the  right  amow  key  yon  can  view  aHttitmpal  infnnwmwn  and 
Hn«*  from,  the  sautccs.  h ihj<i  screen,  the  delete  key  will  toggle  sources  into  »"d  out  of  the 
analysis.  Deleted  remo  wri  sources  are  marked  with  an  asterisk.  Once  you  have  selected  the 
desired  sources,  strike  T-*  • (i  load  the  data  into  the  caicniarioca.1  portion*  of  CARP. 
the  fR*  key  is  available  to  v stunt  to  Screen  6 and.  searirh  the  data  base. 
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APPENDIX  A:  CARP 


Aggregation 

Ab  aggregation  method  is  needed  to  coi'ilnnn  multiple  Ann  souiws  into  a single  e?T*mar>» 
When  trading  with  auuugate  or  £*”*”£  aggregation  of  several  acceptable  sources  is 
fllmngt  always  better  than  stoning  for  one  source  that  co^w  be  con*’dtred  optimal 
credible  iiiflihod  of  aggppg*tir«n  must  •»»»»♦»  of  the  following! 


« Data  toie/Ahw; 

• Data  confidence 

• Dafit  relevance 

Data  toier*"^  deals  with  the  Hifftapnr*  in  location  (Le„  one  source  gives  a higher  rentier 
Htnn  ano*^)  between  the  av«i*M?  sources.  The  toier*"'*e  is  generally  not  nrptainahtt*  5^ 
often  comes  £uiu  sHrfiHy  different  engineenlng  or  *♦»♦«**«<*  aa>— »»uuns  ««d  r«w  nwi  hy  in 
>*mnnHng  iWftaeat  genetic  A*n  sources.  £’*»«*  a gmwir  rf**»  source  that  exactly  mairrfw*  the 
component  in  question,  mHwiing  construction^  rfwjgn,  op*"*ti"5  policy  »«* 

Ciatu unncfit,  mrrmmTning  thi«  tolerance  through  aggregation  ensures  rf»**  the  central  estimate 
and  bomvting  values  co«*"*er  the  potential  itiffr-rcnces.  Increasing  the  size  will  not 
reduce  the  measured  tolerance  t Anri  M«Jiy  t»pniri  only  increase  it. 

The  data  confidence  deals  with  the  m»»«un»mi>m  error  associated  with  how  well  the 
experimentally  measure  partner  repteseats  the  actnal  param*^  the  asm mfe  size 
increases,  the  confidence. wifi  decrease  since  it  *?*l?  only  with  juristical  e iuCuS  and  not  with 
en  g*uuering  uou-homogeitieties. 

The  Ann  teiev»! v*  deals  with  the  *±j±uwj *&**?«***  of  data.  Numerically,  sources  can  he 
weighted  based  on  their  perceived  relevance. 


CARP  uses  the  percentiles  from  the  input  cimwiiariye  distribution  functions  (CDFs)  to  «uive 
at  the  percentile?  of  the  aggregate  disttibution.  Before  aggregating  the  input  distributions 
have  been  fit  to  iogn.-..  «.,»i  forme,  The  method  would  work  for  any  CDF  with  a closed  fw*u» 
solution  or  nitrrwTic  appro 'r’TT|*ri on,  but  tin**  logr- ■■  ■••-■*1  is  the  most  cu*">>“Ui  format  for 

puWi«h?d  generic  CARP  has  only  been  progr» -“d.  to  hantiu  the  iogno'ti'*j 

distribution.  CARP  will  iteratively  deternrinK  a percentile  of  the  aggregate  distribution  using 
a recursive  s**!**™*  with  each  input  CDF.  A Newton-F»"«phen  type  iteration  is  used  to  solve 
the  following  eq«*ti«n  for  the  unknown  X as  follows! 


5 


PRECEDING  PAGE  BLANK  NOT  FILMED 
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* 


H 


where: 


Pr(As3L} 


a 


11:ss  pezcsstiic  of  interest  ir.  the  aggregate  distribution.  CARP  will 
calculate  the  3*  , 50“*,  and  ' 5“  pcsrcsmiies. 

The  number  of  input  distrifc  iriracis 


A 


T'hs  random  variables  of  th  : input  distributions. 


Ihra  failure  rare  occurring  as  the  percentile  of  interest  in  the  aggregate 

iri&sibution. 


«i 


Thn  weight  of  a distribtnior* 
iisinary  equal  to  1/b. 


We  usually  use  equal  weights 


m*irm£ 


this 


The  resultant  aggreg?  : - fe  then  fit  to  a lognormal,  rhrcriiwtion. 

Bayesian 

Ute  Bayesian  update  lisiiiodology  of  CARP  is  tte  commonly  used  method  of  can jagaic 
priors.  The  gamma  (T  hi  :Hsttibu5ion  is  used  to  perfiirm  the  Bayesian  calc«^"nns  when  the 
failure  rate  (per  hour)  r vised.  The  a and  (3  parari  wets  that  characterize  the  T distribution  ^ 
to  the  mean  anil  variance  of  the  lognormal  s follows: 
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The  posterior  mean  »nri  variants  axe  nairwiared  by: 


t+a 


o*. 


where: 


K — The  Of  fitfllWgJ, 

T The  exposure  rim* 

The  posterior  distribution  is  then  fit  to  a logno*-  •«»*  distribution. 

Tfcmsfonnauuu  and  Fitnns’  of  Distribution* 

In  several  *Kffiwesu  portions  of  the  code.  CARP  needs  to  trm*<\j«.*u  a CDF  into  another 
functional  form  Qognu-  ■■•*»)  which  preserves  soase  general  properties  of  the  anginal  CDF. 

C*1"  "*  to  Lognormal 

Following  a Bayesian  upriat*  a distribution  (F(ct,^))  is  transfenned  intf?  a logrv.r.. 

(ixf)  as  follows: 
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O T) 

*-±4-**- 

P 


General  or  Unspecified  *::•  Lognormal 

Two  paT5,'rn,*rtrs  (any  2.  c c *he  mean,  median,  e£,  variance.  or  percentiles)  are  requited  to 
rfefin*  a logik« disc:!  ution.  The  CARP  code  uses  this  chaxacrezisdc  and  allows  a rhnfn* 
between  two  fur  distribution  fitting.  The  first,  and  the  Hcfjmit  meriwyj,  Mean- 

EF,  focuses  on  the  kno1;  • - ;dge  of  the  data’s  central  tendencies  and  as  shown  in  the  event  tree 
of  Hgmp  2,  places  a pri n:  try  toward  using  the  mean  or  median.  The  second,  called  Lower- 
Upper,  focuses  on  the  t uutdtng  values  and  allows  tie  central  esriw»*n«  to  shift  As  «*v*wti  in 
Hgure  3,  method  p! arses  a priority  toward  using:  the  bounding  values  or  the  error  jau  - 
Note  that  if  die  bounds  :;:r  not  available,  this  methc-L  then  shift*  to  the  logic  encouiuassed 
shown  in  Hgure  2. 
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Estimating  the  Exponential  Failure  Rate  from  Data  with  No  Failure 
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TO:  GaiyO^osg,  EnH*»  iv.r^ 

FKOA£  Op*  Mhiiiii  Shoowinn.  Pctft  Appjtm 1 

SUBJECT:  Zero  Failures 


Iutitxh 

We  recently  used  CAHP  to  analyze  some  met*  v*d  failure  rate  data  where  zero 
auures  had  occurred.  The  purpose  of  tW*  i8  to  explore  the  mathematics 

how  CARP  apparently  treats  such.  a si»>nrin«i 


Cuuudean  I 

The  basic  mathematical  work  on  for  teat  data  from  an 

exponential  distribution  (constant  h*»«W)  was  by  Epstein  Sobel 1 +>»*v 
showed +W  if  the  variable  2r  ’ ^ 

2r-2nTX  (l) 

where. 


r * n«Tnher  of  failures, 
n * number  on  test, 

T * test  hours, 

X m failure  rate 


>■■4  an  X2  distribution  with  2r  degrao  offrwdoxn  at  tha  lower  bo"™< 

- (2r^2)  degrees  of  freedom  &t  the  upper  benmd 


Also,  it  is  well  known  that  the  iflrcJfHnod  estimate  (x)  is  given  by 


Problem  With  Zero  fV,i ’hires 


The  problem  with  z en  failures,  is  that  the  mar  jrciri  likelihood  estimate  (x)  goes 
to  zero  and  the  lower  ;:i  incidence  Dound  is  :o.o  longer  defined  since  the  number  of 
degrees  of  freedom  of  distribution  is  aero  The  upper  confidence  bound  is 

still  clearly  defined,  si  the  upper  95%  bound  m i the  variable  2nTX  for  a X2 
distnoution  with.  2 de| ::  s&s  of  fireedom  is  5-£)9L  1 ! sing  this  value,  the  upper  bound 
on  (X)  is  given  by, 

, 5.991  ii,!» S3  3 

A<  2nT  * T “aT  (2) 

Since  we  are  unable  tc  define  a lognonnai  distribution  with  a single  data  point, 
there  is  a difficulty  in  ::  ung  C-ARP.  Some  analysts  say  to  assume  one  failure  and 

calculate  a point  estur.a  e of  the  failure  rate  using  WeHr~,  „rui  Lipow  2 have 
investigated  this  in  dei  :h  and  based  on  a ntunbrr  of  different  statist?/**!  estimate 
principles  they  suggest  hat  one  use  as  the  point  estimate. 


U*’T,jths  values  given  ty  Stjastions  (2)  and  (31  rs  the  upper  confidence  limit  and 
the  mean,  we  can  define  a distribution. 


Calp**l»*ion  Using  CA-li  ,]:* 

Experiments  with  CAR  1 for  a specific  pimple  le  ad,  to  the  ccncfrj*mn  that,  CAWP 

also  uses  the  value  j-j-x  ~ij  for  the  mean  and  ^ for  the  upper  95%  confidence 
limit. 

The  following  e««ple  •»*►«:  entered  into  the  CAR?  program? 

The  input  dU.:,;  for  the  first  run  was; 

Number  of  iliilures  ==  0 
Exposure  t:  x e * 67832  hours. 

The  results  axe  shown  ii:  Table  1 and  hand  calculations  verify  that  the  upper 
confidence  limit  is  compiled  from  and  the  mm  is  from  jjX-U 


One  can  also  verify  by  twiner  the  ratio  of  the  95%  point  to  the  50%  point  or  the  ratio 
of  the  50%  point  to  the  5%  point,  that  the  error  factor  for  a iognormnlly  distributed 
distribution  determined  by  these  two  points  is  appro**1’ rnately  15. 

This  was  also  repeated  by  using  as  input  the  upper  95%  confidence  limit  and  the 

mean  calculated  from  and  • *n(*  CARP  computed  the  same  results 

which  are  shown  in  Table  2 which  are  identical  with  Table  1.  Therefore,  we 
conclude  that  in  the  zero  failure  case  CARP  uses  Eqt»»rinus  (2)  and  (3). 


References: 
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(2)  Welker,  Everett  and  Myron  Lipow,  "Estim*rrng  the  Exponential  Failure  Rate 

from  Data  with  No  Failure  Events",  Procee^m^  1974  Reliability  and 

Maintain ntvtfity  Symposium,  pp.  420-427. 
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CARP  — DATA  ArJitVSIS  DETAIIZT  REPORT 

Cnirponenc  Type  Cede:  A Ccanpa.:>"irr.  Name:  ZERO  FAILURES  TRF 

Failure  Mode  T;pe  Code:  2 Failure  KCds:  CST,i_mJ»2E  X USING  ZERO  FAILURES 


Plant-specific 
Interim  aggrecn k:L 

D 

L 

;-iean 

'4.31-06 

LOWSff 

.vc*3>xaM 

UPPER 

4.42-05 

4.42-05 

zj: 

r 

V 

Aggreaat  e d ger.fr; 
Bayesian  updatfr; 

r 

lal 

:L.  14-05 

1 . 96-07 

2 ..95-06 

4 . 42-05 

15.0 

Final 

L 

4.91-06 

8.45-ni 

: . 27-06 

1 . 90-05 

15.0 

PLANT-SPECIFIC  :vta 


Units  (N  Si:*:.-  r.lsaanos,  H for  hours.  etc.):  H 

N'™h,r  ° • failures:  (ZERO  r*rr^«ajf  JLSSVHES  1/3  TAzrrrtts) 

Exposure  (v  ine  or  number  of  demand?: ) : 67832 


BAVP^tsh  tJPDATr:!;: 

Bayesian  treating  perfonnea:  N 


FINAL 

Final  bash;  !P,G,3)  : p 

Lognormal  :!  i -ring  mem  ad  used:  lo-uf 


AGGREGATION  DEDu'.S 

Aggregation  :i«i:feod  <T,  A,  G)  : T Weighting  method  (E,  I,P,U,  S)  : E 


1 

■flam  lower 

t-EDISN  UPPER 

EF  FAILURES 

EXPOSURE  WEIGHT 

95TH  AND 

ERROR  PALTCR 

1-  • i-05  1.36-07 

4.  "P-5 

2.25-06  4.C2-05 

15.0 

15.0 

1.000 

TABLE  5 

Results  of  using1  CAE! ! c;  tit  a lognormal  distnlmtiaa  for  tie  case  of  zero  failures  in  €7632  hours  ^ ,j| 
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casp  — DATA  AN&EX5XS  DExe.TT.Tn  REPORT 


C'.m.-ji.enc  Type  Code:  A r • *ent  ^ rarrnggg  u>y 

Failure  Mode  Type  Code:  2 Failure  Mode:  Using  Mean  and  upper 


Plant-specific 

D 

MEAN 

Dunum 

MEDIAN 

urv&K 

EF  . 

L 

4.91-06 

4.42-05 

Interna  agg<.«gseed 

Ag**cgated  generic 
Bayesian  updated 

L 

4.91-06 

8.44-08 

1.27-06 

1.90-05 

15.0 

Final 

L 

4 . 91-06 

8 . 44-06 

1.27-06 

1.90-05 

15.0 

PMNT-SFIa;i*  rC  DATA 

Onits  (N  for  ids,  a for  hours,  ece.i: 
Number  of  failures: 

F.npwsure  (tin®  or  nuaiOer  of  deasss) : 


aawcigjj  apn&xiKG 

Baynian  updating  parfp™*<;  ij 


ram 

basis  <P,G,B):  G 

£ogtionn*>  fitting  method  used:  MN-EF 


AGGREGATION  fiBTaTT.y 

Aww*egatien  method  (T,A.G) : T Weighting  merhoii  (E,I,P,D,SJ : E 

MEAN  LOWER  medtrw  tdrftui  2F  ~ATTttr£S  EXPOSURE  wmagy 


MEAN  urreR  4.91-6  4,62-5 

4.91-06  8. 44-08  1.27-06  l!90-O5  15.0 


1.000 


NOIE:  Mean  and  Upper  were  calculated  by  using  - ( (1/3)  /T)  and  Upper  - (3/T)  ] 


TABLE  2 

Results  of  using  CARP  to  fit  a iognonrra!  distribution  for  the  case  of 
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Descriptors;  4]  1,  422,  410 

(ASQC  Lilcnimf  Symotr\-we*  *>•**  xca) 


Introduction 


Assume  an  exponential  failure  pattern  with  unknown  constant 
failure  rata  X.  Suppose  that  there  are  n failures  in  T operating  pjart 
hours.  The  maximum  likelihood  and  unbiased  estimate  of  X is  ^ 
given  by  the  formula  1 


X - n/T,  n - 0, 1, 2,  . . . 


This  estimate  is  routinely  used  except  for  the  zero  failure  cm.  if  no 
failures  occur  in  the  ten,  the  estimate 


X - 0/T  - 0 


is  usually  considered  to  oa  unsat isfactorv  in  spite  of  the  fact  that  it  is 
an  unbiased  veiue derived  by  the  maximum  likelihood  method.  This 
point  of  view  reflects  the  judgement  that  ■ non-zero  failure  rate  really 
does  apply  but  tnat  the  test  lime.  T,  has  by  chance  been  too  short  to 
exhibit  a failure  event.  There  o no  generally  accepted  method  for 
handling  this  zero  failure  problem.  In  this  paper,  we  will  describe 
in  me  alternative  approaches,  with  emphasis  on  methods  which  modify 
1*  ihe  maximum  likelihood  formula  whan  n » 0 but  laava  it  unchanged 
^ “"^when  n > 1, 


The  prw^Kjjjty  of  n failures  In  T operating  part  hours  Is 


«n,T>  - 


{XT1IV*T 


nl 


n - 0. 1,2, 


Return  to  a consideration  of  the  shamarkve  approach  in  which  the 
maxrnxivT)  likelihood  estimate  is  used  for  ail  *****  except  the  one  in 
which  no  failures  occur  in  a test  time  of  T pan  hours.  The  estmwtor 
for  zero  failures  is  green  by  X - k/T  where  k is  the  value  of  f<Tl,  either 
a constant  for  all  T or  the  value  obtained  by  substituting  the  *watic 
value  of  T in  the  function  ffT). 

The  probabilities  for  the  different  values  of  X are  si  follows. 

Number  of  Failures  | X 


0 k/T 

n « 1,2,  . . . n/T 

The  swage  or  expected  value  of  the  Btinwtw  is 


Probability 

,-XT 

(XT)V*T/nl 


The  Basic  Mathematical  Reletfomhios  involved 
In  Estimating  theCoi^Uw.l  Failure  Rata 

Consider  first  an  approach  in  which  the  maximum  likelihood  efti- 
mate  It  modified  for  n « 0 but  is  unchanged  for  n > 1.  The  anima- 
tion formula  can  be  wnttan  as 


X - fTH/T  for  n - 0 


and 


X ■ n/T  forn  * 1.2,  . . . . 


E(i|.le-XT+  ^ 

The  variance  of  X is 

2 , r i 

cr ^ — j XT  -2kXTe“^"  + ^2e-2XT  „ 


TFte  mean  and  variance  of  the  maximum  likelihood  estimator,  X,  are 
obtainad  by  letting  k * 0 In  the  above  expressions.  This  gives 


******  is  the  number  of  failures  observed  in  rest  lima  T.  Thus  the 
modification  in  the  formula  is  expressed  as  a change  from  zero  in  the 
nreneretor  to  f (T).  We  will  &****?  % number  of  modifications  in  which 
f<  M is  assisted  a constant  value  independent  of  rest  time.  It  is  obvious 
that  we  should  restrict  f(T)  by  the  inequality 


0 < f(T)  < 1. 


The  lower  bound  enures  that  the  estimate  » positive.  The  upper 
bound  assures  that  our  failure  rate  estimate  for  zero  failures  is  not 
greater  than  the  animate  when  a failure  event 

With  respect  to  the  population,  the  following  formulas  are  parti- 
Jn.  The  time  to  failure  density  function  is 


u(t|  - Xe~*1  o < t < «* 


* 0 -«■<  t < IX 


0X1  - X and  o?  - 
The  bias  in  the  modified  animator  is 


Two  Commonly  Used  Constant  Replacement* 

For  Zero  in  Estimating  Failure  P*«* 

The  Rtereture  contains  numerous  instances  in  which  the  zero  fail* 
ure  trobtam  has  been  handled  by  using  « somewhat  arbitrarily  selected 
number  of  failures  in  lieu  of  the  observed  number,  zero.  The  fre- 
quently encountered  selections  are  unity  and  one  half.  These  tun 
values  are  suggested  on  the  bads  of  very  similv  logic,  in  a sen*,  each 
of  them  yields  a reasonable  uppar  bound  on  the  failure  rata  as In 
the  zero  failure  case.  We  hava  already  observed  that  it  would  be  « iogi- 
flsl  to  reoreoa  zero  failures  by  more  than  one  failure,  so  unity  is  indeed 
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ppFrarnfta  pane  m ank  not  filmed 


an  upper  bound  from  the  common  w roe  viewposr  1 'ifc  of  «>ne 
half  if  a direct  application  of  the  Yftes  corrector  rr.  * ccnmnuitv. 
Observed  data  yield  only  integral  numbers  of  faili : *1 : s©  one  i nturprata 
the  occurrence  of  rt  failure!  as  covering  a range  fru  t h ••  0.5  to  n ^ 0,5. 
Thus,  we  would  say  that  n-0  covers  the  range  frn  - i - 0 up  to  n »05 
and  therefore  n m 0,5  is  a logical  upper  bound  to  f ; ! si*.  :mimaring  the 
failure  rate  in  the  zero  failure  case. 


Therefore,  to;  -:era  failures.  the  use  of  the  upper  confidence  limit  for 
tne  point  eniM  vos  is  exprsasd  by  the  formula 


X * 


Evaluations  of  these  two  animates  will  be  pm  if:  it  id  later  in  this 
paper.  However,  we  would  like  to  comment  here  -x  ml  one  basic 
weakness  in  the  use  of  a constant  replacement  foi  n 1 c?  independent  ot 
the  time  duration  of  the  test.  Suppose  two  sepet-r  ;i  m* -as  have  brcn 
performed  with  failures  in  Tj  hours,  i * 1 and  Ti.  ! :i  s«  tests  gener- 
ate failure  rate  estimates  £j  * rtj/Tj.  Now  con  sic  I ; 1 :v  i^imato  gen- 
erated by  combining  the  experience  from  the  two  a ti . 


X » 


nf  * 22 

Tt  + T2 


Suppose  X-j  < X2-  It  is  easy  to  show  that  ^ < - Xj>*  k TO 

say.  combining  test  data  from  two  i«u  gives  an  in  n ■ ns?  mtervnedieta 
between  the  estimates  for  tha  roparate  tests.  Nov/  ■:::  ri  :clar  the  base  m 
which  « n2  * 0 and  we  use  e constant  k whi : ! i -thisfie*  0 -<  k 
< ! instead  of  zero.  Wb  then  have 

* k/T^,  X2  * k/T2and  A m k/(T<  !*  l;i), 


In  this  case,  X is  not  between  X^  and  X2  - rathe'  i*i  ::  -mailer  than 
either.  This  is  not  a serious  problem,  but  it  does  r:  ,n  :nvuite  a slight 


; n effect,  this  r *pi.Kas  v » 0 by 


f - 


There  is  no  acr  niriftnt  on  a a referred  confidence  level,  a,  but  the  values 
50  peram  an  i ISO  percent  team  to  oradominne.  Table  1 shows  the 
replacement  v'liufis  of  rl  for  these  two  confidence  levels  and  for  three 
other  tevet*  wv-sh  will  bu  nfewoed  In  the  to  follow. 


1 rt;;  SMronm 

2 « 
’Li® 

.i-l-3 

.5 

40 

.511 

SCI 

.593 

(50 

,916 

S3L2 

1.000 

TABLE  1. 


inconsistency. 

Before  we  leave  tha  subject  of  tha  use  of  a cu  * ; -itk  replacement 
for  zero  in  the  failure  rate  animation  formula,  w?  * s m.\  like  to  refer- 
erica  one  other  approach  which  is  similar  to  the  *r  !:i  :►  f 0 .5  as  dtsntmd 
above.  Suppose  we  agree  to  record  failure  rates  to  s : Iwrmai  places. 
Proper  rounding  procedures  would  yield  interpret!  ■:  -m i as  follows,  if 


X - .000002, 


we  would  intwM+*i  this  to  mean 

.0000015  < X < .000002!: 


The  weioc  p(?  it  included  in  Table  1 cover  a range  from  one  half  to 
one  m the  nym«r  of  failures  which  are  used  In  place  of  zero  in  the  fail- 
ure me  ttRsrrcr  orn  ton  mbs.  Since  j/2  incmesae  with  a,  it  would  be 
illogical  to  eonricisr  any  % larger  than  63 J2  percent.  At  the  low  end, 
wo  rather  aroii;;  w y ttutrped  at  a * 39.3  percent,  the  level  which  gen- 
erated the  fniir  i 5 replacement  value  of  0.5,  The  40  percent  level « 
shown  becausr  t i/  itie  closet*  level  to  392  which  rs  readily  avattole  in  - 
published  tabh  of  the  y?'  distribution.  Indeed  rt  has  been  suggested 
that  t twi  60  per  :!*rrr  lavnl  in  often  used  because  it  is  available  in  published 
rabies  and  it  if  ima  to  the  level  a • 632  percent  which  yields  a re- 
pdnoamam  of  r! : tv?  hv  unity. 

it  ItUigioai  to  Use  a Confidence  Limit 
As  a Point  Estimate? 


Wa  would  say,  therefore  that  in  tTO  zero  failure  ct  -is , 
X - ,000000 


really  means 


.000000  < x < .000000? 


and  the  seven  decimal  upper  bound  would  be  an  en ; ml  v appropriate 
estimate. 


The  Use  of  an  Uoper  Confidence  Li  mil  i ;i  of  a 
Point  Estimate  snthE  Zero  Fatiur;  j;  .-m 

Perhaps  the  most  common  approach  in  the  x i n : 1c il  ure  esse  ir  to 
raplecetheX  - 0 point  estlmata  by  an  upper  coivkhr, sh*  limit.  For 
n * f failures  in  T hours,  sn  upper  confidence  lirnr  ■ : < confidence  iaval 
ait  p.  vi),^. 


2 

Xo.  2f  + 2 


/2T. 


Thero  n *r  ^.mdiamental  difference  between  a point  estimate  and  a 
confidence  urrti • Trienjfprt.  it  is  appropriate  to  consider  the  implice- 
tiona  of  this  dif  ■.‘•arence  on  the  use  of  a confidence  limit  as  an  onmica 
of  the  tsiflins  rrr:t  in  the  two  failure  case.  We  recognize  that  any  esti- 
msnon  method  s ocoitiiabte  if  it  generates  good  answers  regardless  of 
tha  purposes  tty  v/  "iich  rr  was  developed.  However,  analyses  of  the 
original  purport7  and  of  thn  properties  of  the  estimation  method  itself 
may  shoo  mrr  sight  or  the  logic  of  the  novel  apoiicetion  under  con- 
lidanitiort  Per  ii::rpc*i«  of  brevity  end  emphasis  we  will  ewerrimpitfy 
ihis  ditexiarion. 

A point  iiriKt  n nn  answer  to  the  following  quest ioa  Rerod  on 

& toeafic  set  cf  f®male  observations,  what  is  the  best  guess  i can  make 
ax  to  the  angle  : -a Me  of  « particulaf  population  parameter?  In  our  case 
the  parameter  ?r  cru  constant  failure  rate.  On  the  other  hand,  a confi- 
dence iimit  answers  an  entirety  diffarent  question,  for  each  possible 
population  con:  wriar  vslue,  we  ask  the  following  question.  If  this  were 
the  true  DDDUta'^pn  sjcrumeOK',  would  a ample  at  least  as  epod  as  the 
observed  one  tx  liksly  or  unlikely?  The  answers  for  alt  pottiWs  param- 
eter veiufis  am  s1 1 *m  marf  n-d  bv  dividing  them  mto  wo  groups,  one  con- 
taining  alt  valuT  for  which  the  observed  sample  is  likely  and  tha  other 
containing  ttie  viIubj  rc  r which  it  is  unlikely.  A boundary  between  these 
two  groups  u r #,cmfi dance  limit-  A useful  approximate  rommery  is  as 
follows.  A cot:  kutipu-iie  is  *n  answer  to  the  question,  given  a rompte, 
vdiOT  cam  i ay  sricta;  the  qdpuIipoo?  A confidence  limit,  on  the  other 
hand,  is  derh'ty:!  -ay  cemrs  daring  the  probablHdesof  obtaining  ^attain  ^ 
samptes  from  tn\  zvvt;  pc pu  istiont  We  should  note  that  the  term  * best 
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gum"  must  be  defined  tor  obtaining  a point  esL„»i»  and  ouandtativa 
tevete  lor  likely  end  unlikely  mutt  be  specified  in  obtaining  e confi- 
dence limit. 

W 

The  oonoeots  of  point  estimate  and  confident*  limit  are  illustrated 
in  Figure  1 . A test  for  T hours  yielded  two  triiurre  giving  e point  esti- 
mate of  X - 2/T.  The  60  percent  upper  confidence  limit  it  3.1  t/T,  A 
sample  at  least  as  good  in  this  cm  « one  with  aero,  one  or  two  failures. 
The  probability  of  Rich  e sample  is  at  least  1-0.60  * .40  if  X < 3.1  1/T 
and  it  is  no  more  than  0,40  if  X > 3.1 1/T.  Thus,  eii  values  of  X in  the 
range  aero  to  XI 1/T  are  classified  as  being  likely  and  those  wove 
XI 1/T  are  classified  as  unliketv  at  the  60  percent  confidence  level.  Of 
coun»  the  point  estimate  appears  as  a tingle  point,  the  best  guns  of 
the  value  of  X based  on  the  observed  sample. 

It  Is  apparent  that  the  upper  confidence  limit  re  really  quite  dif- 
ferent from  a point  estimate.  Indeed,  it  much  more  logical  to  compare 
or  match  the  point  mturwta  to  the  entire  interval.  0 < X < XI  1/T, 
ratherthan  to  focus  attention  on  the  upper  limit  of  this  interval  We 
migftt  well  consider  some  point  within  the  interval  as  a more  appropri- 
ate analogue  of  the  point  estiva,  as.  for  example,  the  midpoint. 
1,555/T.  We  wish  to  emphasize,  however,  that  in  spite  of  the  funda- 
mental difference  between  the  concepts  of  a point  animate  and  an 
upper  confidence  limit,  it  is  entirely  appropriate  to  use  the  confidence 
limit  to  derive  t point  afrimvte  for  the  zero  failure  case  if  the  estimate 
has  the  required  desirable  properties. 

It  has  been  suggested  in  some  report*  that  the  60  percent  upper 
confident  limit  is  appropriate  In  lieu  of  a point  estimate  because  the 
50  percent  limit  is  just  as  likely  to  be  above  the  true  value  as  below  it. 
This  reasoning  appears  to  cs  a somewhat  mcorrect  interpretation  of 
the  basic  confidence  mrenrei  concept.  Consider  the  following  descrip, 
don  for  tne  constant  failure  rate  eew*wU«t>  For  ■ population  with 
failure  rate  A,  the  probability  of  n failures  in  T hours  is 


Consider  two  categories  of  possible  values  of  the  true  population 
failure  rate. 

1.  If  A < .693/T.  the  probability  of  zero  failures  in 

time  T is  than  0u5  so  Xy  - 6,93/T  exceeds 

X more  than  half  of  ths  time. 

2.  If  X > .693/T,  the  probability  of  zero  failures  in 
time  T is  less  than  0.5  so  Ay  * .693/T  exceeds  X 
less  than  half  of  tbt  time. 

In  order  to  get  the  even  split  of  confidence  limits  — ons-hatf  above  and 
one-half  below  — we  would  have  to  indude  confidence  limits  for  the 
cases  with  failure  events  along  with  the  taro  failure  value.  Therefore, 
regardless  of  ths  true  X,  If  we  use  ths  60  percent  upper  confidence  limit 
as  e pomt  era  mare  only  for  the  zero  failure  case,  we  do  not  have  an  esti- 
mator which  is  as  likely  id  be  too  high  as  too  low. 

A Solution  For  the  Zero  Failure  Can  Using  a Modtf *«*rry? 
of  the  Upper  Confidence  Limit 

Let  re  now  consider  some  of  the  relationships  between  the  maxi- 
mum likelihood  point  estimate  end  the  upper  confidence  limit  for  the 
ooimam  failure  rata  case: 

Point  Estinuu  ft  * n/ T \ 

>n  - 0, 1,2, 

Upper  Confidence  Limit:  Xy  • X*  2n  + */2T  ^ 


Certain  significant  facts  about  X and  Ay  are  revealed  m ths  following 
tabulation 


IX'rWXT/nl, 


the  maximum  likelihood  annual*  is 


X 


n/T, 


and  the  upper  confidence  limit  on  X,  denoted  by  Ay,  at  level  a is 


\i  * 2n  ♦ n * 0,  1,  2 . . , 

Wt  can  then  say  that  the  probability  of  obtaining  an  upper  confidence 
limit  of 


A 


* xl  2n  * 2™ 


Number  of 
Failures 

n 

*T 

«r 

-i 

a • .40 

a • 50 

a ■ JED 

0 

0 

.51 

.69 

.92  *•* 

1 

1 

1J8 

2.Q2 

2 

* 

2.29 

2.67 

Iff 

3 

3 

3-21 

3.67 

4.18 

TABLE  2 


For  commence,  we  have  tabulated  XT  and  A^T  rather  than  X and  Ay. 
The  upper  confidence  limit  Xy,  increases  with  the  confidence  level  a 
fbr  each  n and  it  also  mcr— — ^ with  n for  each  a.  We  also  observe  that, 
for any  particular  a,  the  net  difference  and  the  percents®*  difference 
between  X and  X.,  approach  zero  at  the  number  of  failures  incr;????. 
This  observation  lead  us  to  consider  the  following  approach  to  the  zero 
failure  cast. 


k 


(AT)fVv*7nl,  n - 0, 1,2,  . , . . 


it  can  be  shown  that  a is  the  probability  that  Ay  > 


X. 


Now  consider  the  50  level.  It  is  true  that,  considering  a 

numbers  of  failures,  n » 0,1,2,  . . »,  SO  percent  of  tne  upper  con- 
fibers  iimrts  would  be  ax  peered  yd  exceed  the  true  copulation  me, 
and  50  percent  would  not.  When  there  ere  no  fwturre.  tha  60  per- 
I : upper  confidence  limit  is  .693/T,  and  this  Is  the  suggested  point 
when  using  the  upper  confidence  limit  in  lieu  of  A • 0, 


Consider  the^retio  between  the  point  estimate,  X,  and  the  uppre 
confidence  limit,  Xy,  for  a selected  value  of  a and  for  various  ryxmbert 
of  failures,  n.  We  first  selected  a » .60  and  we  computed  X../X  and 
XAy  for  n ■ 0, 1, 2 IQ.  The  results  ere  listed  in  Table  1 

An  ikk^ntoon  of  the  redos  in  Table  3 led  us  to  consider  the  pos- 
sibility of  modifying  the  zero  failure  estimate  by  some  type  of  extra- 
polation of  the  ratios  for  tha  cases  with  failures.  The  method  k 
iUusnred  in  Figure  2,  Consider  firet  tha  upper  curve  which  shows  tha 
reti©  Xy/X  for  n ■ 1 , 2,  . . ,f  10  with  a smooth  curve  tha 

points  from  n » 2 ton  ■ 10.  For  n » 0,  the  curve  would  approach 
the  vortical  axis  as  an  asynvurw  for  tha  maximum  likelihood  zero  fail- 
ure aeonra.  In  lieu  of  this,  we  decided  to  make  an  extrapolation  of 
the  Xy/X  curve  back  to  the  vortical  axis,  find  the  intercept,  and  compute 
a modified  penmate  for  the  failure  rate  for  n * 0.  The  curve  shows  a 
straight  line  extrapolation,  using  the  line  through  the  points  for  n ■ 1 
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A A 

o 

.494 

.644 

*719 

.764 

.795 

.317 

.334 

J46 

359 

368 


1"h2f  hi-  mt  formula  o t the  Bayesian  method  edaoted  to  the  failure 
raw  fWtirtt  C* problem  t* 


P<n  ■ f 1X1  w<X> 

'M  m" p(n  - V 


• ;ii'  ,jsnd 

• .oO 


where  the  rymtoois  am  defined  a*  follow*. 

n is  a general  symbol  for  the  number  of  observed 
la  Hum  in  T part  hours 

f it  the  number  of  failure*  observed  in  a particular 
i:ett  of  T part  hours 

X is  the  unknown  constant  part  failure  rate 

w|X>  is  tha  prior  distribution  or  the  assumed 

iiMmtoy  of  X 

■ x)  is  the  conditional  probability  of  observing  f 
failures,  given  the  value  of  X. 

f ) is  the  unconditional  probability  of  observing 

F failures  based  on  the  assumed  prior* 


TABLE  3 

and  n » 2.  This  tine  intersects  the  axis  in  *■ 

60  percent  upper  confidence  limit  for  n * 
modified  X for  n » (>  is  obtained  by  soivirvi 
« 2.5  giving  i - .368/T,  Using  parabolic  et 
points  for  n * 1 , 2,  and  3,  tha  estimate  is  h 
the  degree  of  the  curve,  we  would  approetfi 
estirmteX  » 0 as t limit*  Weehosato conn 
potation  for  wnpUdty.  The  lower  curve  in  f 
rocal  ratio,  X/\,  and  the  «nar  exnipoiaoc' 
failure  aiiiVaiM  of  X ■ ,316/T. 

If  tNs  modification  method  to  to  be  w > i ' v «»f  *•**”» cwtidere- 
tion  at  a possible  •olutlon  for  tha  nro  faituo  rr ‘tilam.  we  muw  owao- 
tish  that  it  does  have  laeaombly  suitable  prc » » n One  immedieta 
question  related  to  tha  tansktvity  of  the  *fts»  ■*  \ n to  r-he  contidfnre 
level.  To  chedt  on  bib,  wa  computed  modi  I *l  ■ ■ ■ stunatre  tor  various 


* *>ciiTv  (0.  2.51.  'rhe 
ii.3Ti3/T,  Therefore,  a 
•r  « rsruation  ,£16/TX 
t ■ isolation  through  the 
,;  :Z/fT.  Ai  wo  increase 
- ■ pjsximuffl  likelihood 
dvr  aily  the  linear  extra- 
i i jrs  3 showr,  the  radp- 
r:  t hin  curve  gives o zero  • 


X Ei.j 

1 1 1 Based  On 

Confidence  Level, 
a 

v>. 

| uu 

.11  i . ;r:  li  S! 

.40 

1 .30 

1 

£0 

.34 

■" 

.60 

.37 

-32  j 

.63 

J3 8 

32  i 

TABLE  4 

P<n~1 
P(r 

wiXI  r 1)  is  the  posteriori  distribution  of  X,  conditional 
on  zhe  observation  of  f failures,  assuming  the 
prior,  w(X). 

The  ftsnca  in  Pi  n :,t  h i*  determined  bv  forming  the  joint  density  of  n 
and  X j»<i  ncj  n **  f,  and  integrating  on  X* 

For  r:  rtsiint  purposes,  we  need  there  functions  for  the  axpontrmsi 
popuietic ' with  fnilufs  rate  X-  They  are 

fin  - t|X)  - *"XT 
Tha  joint  firefly  of  n,  X is 

P(n  - IIX)w(X|  - «'XT«fW 


Therefor: 


«„  - fl  - / fXTlf  .-^wlXIdX 
X,  fl 


whM*  Xt  ;n«  X2  arc  the  endpoints  of  We  range  for  whkh  w|X)  > 0. 
Theft 


v IX  In 


(XT»<  XT 

f}  • 


w(X) 


These  estimatas  do  ooeerthe  ratatiseiv  snw  t!  rm>  from  X ■ .30*® 

X « .38,  to  sensitivity  to  the  selection  of  r -< ;•«  '««  «eem  to  be  a 
problem.  We  will  dieoissother  properties  <?  • "it  'ultimate  letw. 

The  the  of  Bayesian  Stariati ; tr  JVsivs 
the  2ero  Failure  Pro  ; i;;  ti 

Since  Bevesen  stattoticet  methods  pro.  iih  for  tha  combirirtion  of 
prior  information  or  hypothsses  with  cum»  ' "«r  *»ta.  thev  orter  a 
pleutobie  approach  for  solvitHI  the  wro  fail  -■  v :i»:l®m  of  interest  here. 
Therefore,  we  have  conodemd  Beyaian  mrln  tit  to  obtain  failure  raw 
estimates  based  on  • number  of  prior  distr !:  u:  <>*  *»*  P^***J®n 

White  rata.  X.  We  will  briefly  review  the  r 1 r,  imoties  of  Bavtwen 
estimation  end  then  we  will  ummenze  We.  r r,  ivw  wmch  ifAuro  obtained. 


} «XT1f  b-XT  w(X)dX 
X,  fl 


To  li  e this  reletionship,  it  is  ne~"V  to  select  ■ 
for  wf  XJ  '//sfi  -nr®  iriwrwtcd  in  examining  th*  properties  of  ti>e  PJ***rx 
SSnVitMn  » f).  eorrii^indingto  various  seleedonsfor w(X). 


A n :Linot  fiisi:  choice  is 
w{Xi  - e, 

- 0 


0 < X < 1/a 
for  dl  other  X. 
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Thij  prior  generates  the  posterior 


wiXln 


..  <XTjf  -XT  / Vf  <^lf  _xx 

fl  " f.  # / J fl  *XT 


dX. 


The  integral  in  the  denominator  can  be  written  in  term*  of  the  incom- 
plete gemma  function  denoted  by  liu.pl  end  defined  by  the  integral 
(2,  p.  V) 


It  we*  decided  to  look  alto  at  the  gamma  prior  density 

wttl  - k a * 1 XV*\  k > 0,  b > -1,  0 < X <•  . 
Using  the  same  mathematical  relationships  as  before,  we  obtain 
(T  ♦ kl4  + 1 * + V<T  + k)X 


wfXIn  « fl  • 


(a  + tH 


1 u/*Vp  + 1 

l(u»p>  •— y vPb^chr, 


Bv  eppraprietefy  expressing  the  denominator  in  the  form  of  the  incom- 
plete gamma  function  and  simplifying  the  resulting  expression,  we 
obtain 


and  the  failure  rate 


wtXifi  • f)  • 


<XTVe 


f.-XT 


tly/1  * 2 I(T/a  Vf  + l.fl 


It  is  customary  to  us*  the  mean  of  the  posterior  distribution  of  X as  an 
estimate,  X.  Jn  thia  ease,  then. 


1/a. 


X « f Xwttln  * f)  dX 

i 


wiecSi.  upon  integration  bacomes 


* ^ (f  + l|  Vf  ♦ i HT/a  Vf  * 2, 1 + 1) 
T Vf  ♦ 2 UT/a  Vf  + l,f) 


Wa  dad ded  to  u*e  two  valua*  of  the  constant  a tn  this  study.  It  was 
natural  to  let  a * T tinea  this  restricts  X to  the  range  from  zero  to 
* 'Taai above.  A*  a matter  of  acedemic  interest,  we  did  went 
to  consider  the  limiting  case  as  a approaches  zero.  We  then  obtain 
tha  following  animation  formulas, 
for  a «>  T, 


A _ ft  4 II  V»  ♦ 1 1(1/  Vf  + 2,  f + 1) 


Fore 


HV Vf  ♦ i,fl 


l-**1 

T 


a ♦ f ♦ 1 
T ♦ k 


Sines  this  peps r is  concerned  with  the  mo  failure  csss,  we  were 
naturally  led  to  consider  prior*  with  a maximum  at  or  near  X - 0.  The 
si  mote*  pnor  of  this  type  is  represented  by  a straioht  line  joining  the 
pod*  X * 0.  w(0)  * 2T  to  the  point  X * 1 w{^>  - 0.  the 
tion  toeing  T * 


w(X>  - 2T-2T2X. 

Using  the  formulas  given  above  and  supposing  a tan  with  no  failures  in 
T pert  hours,  the  resulting  posterior  density  ts 

wlXfn  - 0)  - em*T  + * IT -XT2). 

The  corresponding  failure  rate  act Uimim  Is 
1/T 

x - elxl  - f X*‘XT  + 1 (T - XT2)dX 

- (3  -af/T 

- 2BfT. 


As  a natural  next  *t»p,  wo  considered  the  uw  of  tfte  abova  postarior 
ase  prior.  If  we  let 


w <X)  - «-XT  + 1 (T-XT2), 

and  if  we  suppose  T test  hours  with  no  failures,  we  obtain  tha  posterior 


Far  smell  numbers  of  failures,  f.  the  following  estimates  are  obtained 
for  these 


0 

7 

2 

3 


A 

X 

A 

X 

a • T 

a — 

,412/T 

i n 

♦soe/T 

2/T 

.eio/r 

zn 

.744/T 

4TT 

wfAin  - 01  - *-2XT  + 1 (T-XT2s/25  (a  + e_l) , 
As  above,  the  corresponding  failure  nets  estimate  ts 
X - 2/T (a2  ♦ l>  - .2384/T. 


W«  can  rapaat  this  process,  wing  the  last  derived  ^,14.  Jur  as  a 
prior,  generating  a naw  postarior  and  • new  X,  and  continuing  to  derive 
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tdd mortal  estimate*.  Use  the  notation  w0  M for  the  r " ; I .-weight 
in*  onor.  w,U)  th#  ooKartor  derived  from  w-iX),  w-j(  > i*'  »c*tenor 

derived  bv  using  w^Ia)  as  a prior,  and  to  on.  Th#  telfouiin  rtm/itsare 
jbtanad. 


DU~» — ign 


For  an  initial  coir  laericon  of  the  alternatives.  let  us  consider  the 
logic  underlying  eacn  i l;  th*  aFiproaenea.  The  first  method  involves  the  - 
replacement  of  n * 0 :iy  a cottraint  which  it  an  upper  bound  in  tome 
sense.  The  selection  nr'  s*>ch  a constant  does  introduce  an  element  of 
arbitrariness  tnfl  it  wii  ■ hf)  <3D«ntid  that  this  it  really  true  for  the  other 
methods  » well.  It  if  rjrrKwnat:  illogical  to  use  an  upper  Pound  in 
place  of  a point  omrr:  :e.  3v  its  very  nature, « point  estimate  is  an 
overage  or  measure  cn  ■•remrei  vemJencv  tno  we  therefore  usually  prefer 
si  "bwt  Ktimsof 1 rath  r thun  n«  upper  or  iower  bound. 

The  upper  confid  ^nce  limit  usad  as  the  basis  for  three  different 
methods*  The  first  cn  : used  tho  upper  confidence  limit  directly  as  a 
point  estimate.  a prasr dure  Irirtiuontly  encountered  in  the  literature. 

The  sei^snd  one  ttsc-d  miricoint  of  the  tccevt  interval  as  the  point 
istimate.  The  tftlrri  rr.twsa  ?r  velvet*  an  extrapolation  using  a ratio 
relationship  between  rhe  uciper  confidenee  limits  and  the  point  esti- 
matcu  for  the  aoiwsrr  failure  erases.  There  are  two  basic  objections  to 
iheuse  of  the  upper  pr  nficwn'*’ limit  itself  as  a point  estimate.  The 
iserrfkfcmoe  limit  is  an  *poer  bound  and  not  a "best  estimate"  and  ft  ft 
very  sensitive  to  the  cfl-^iea  of  a^nfidenc*  level.  Both  of  thm  objeaions 
urn  to  ft  large  extent  r vented  lay  the  u»  of  the  midpoint  of  the  "accept* 
confidence  irrtzn-val  v.  ' & rsnre*  from  wo  to  the  upper  confidence 
limit 


The  extrapolation  of  the  robot  between  and  X for  n > 1 back  to  the 
n m o axis  teems  to  hnve  an  intuitively  natural  appeal.  Such  anextre- 
poistion  appears  to  cniolifti  a conosceney  between  th*  estimates  band 
on  tests  with  failure  n m »ut  ami  the  modification  for  the  aero  failure 


Bayniai  (stimtf  "in  methodt  are  of  course  attractive  since  the 
\ypt"i  esneeotr  of  Bav1’  iteas  martmia  are  expressed  in  terms  of 
]ng  me  infometion  vrtb  p rwicimly  «u»^ved  ideas  ebout  the  pawn* 
mere  being  tSuiMuJ,  The  omr  serious  weakness  in  the  Bayesian 
approach  for  the  prorrm  application  is  the  extreme  sensrtivity  to  gw 
chol  a*  of  the  prior  — rimant  any  answer  can  be  getmauti  by  sale  ding 


This  peper  has  considered  alternative*  to  tha  meth:  t 3?  maximum 
iketihood  for  estimating  an  unknown  constant  failura  rn:i  from  data 
framing  of  the  number  of  taikirae,  n,  observed  in  T cm ,J!  #f.  ti  rs  of 
operation.  Of  course,  the  maximum  likelihood  estimar:  r > *'■  rtiT , 
is  entirety  acceptable  tor  n > 1.  However,  for  the  zem  ri  sh  ine  case, 
tho  rati  matt  X ® 0 is  presumed  to  be  too  low.  We  wiL  *>  .k..n£*  tne 
alternatives  which  have  been  described  previously  snd  uwt:  v pre- 
ferred procedure.  Modificetion*  in  th*  animating  form  i > wi  U lw  ex- 
premd  as  replacements  of  n * 0 failure  in  the  likelihit  i 
X * n/Tf  by  non-zero  values  ot  n generated  bv  the  van  r itmtwtNM, 

These  modifications  are  presented  in  Table  6. 


a suitable  prior* 

In. selecting  confidfKib*  livnlt  and  priors  to  be  used  in 
numerical  feibtnhuticr  for  zeio  failures,  we  attempted  to  ®verein»-  *• 
sonabiy  oompfcfm  arm  'ogtaa-t  range  of  alternatives  as  dhcuraod  pmi- 
ously  in  the  paper.  W'  c n * *!imine  the  entire  collection  of  these  nun** 
ericef  results  to  gain  n:  :n-r  intigta  into  the  aerwtivity  of  the  answers  to 
the  estimation  methc-r. •**!*:!  tci  ihe  saiections  of  confidence  level*  and 
priori.  The  noiwerc  <:iihm  r»i:ii8oament»  shown  in  Table  Cara  oftiered 
in  Table  7 without  ref -rwm  t > rrwtiiod  of  derivation.  The  "zero  plus 
litntn  extra  docinwl  r Sara"  is  necessarily  omitted. 


SUli  HiAflY  OF  MODIFICATIONS  OF 
Sirno  FAILURE  ESTIMATES 

(REFLACEMF:  H FOR  ZERO  AS  NUMBER  OF  PAIt.URES} 


Constant  Replacements 

j 

Eiaynsian  Estimates 

’ II 

Unity 

■ > ni  i 

Asters  jt 

On*  Half 

Zaro  pku*  6 in  an  axtra  decimal  place 

Uniform,  0 to  1 

Replacements  Based  on  Confidence  Um  : . 

Uniform,  0 to  •> 

Level 

.40 

.50 

.60 

.63 

Upper 

Limit 

.69 

£2 

1.00 

Interval 

Midpoint 

ie 

JB 

.46 

30 

Ext^p^atei;] 

32 

34 

37 

38 

1 

2T  - 2T^X  IUj.  -'  jnl  v « « cnor  in  the  text) 
,-XT  ♦ 1 

4 ,-2XT  fr  - XT^)/f  1 ♦ «’2I 

Estimate 

.41 

1.00 

.33 

•28 

2* 


TABLE  B 
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Ordered  Non-2ero  Failure  Replacement* 


.24 

33 

.50 

M 

M 

-51 

J2B 

35 

.60 

.30 

.37 

.92 

.31 

.38 

1.00 

.32 

.41 

1.00 

.32 

.46 

1.00 

32 

.60 

TABLE  7 


There  <•  a nattier  tmoorh  progression  from  .24  to  .61.  followed  by 
wide  interval*  from  there  to  the  upper  value.  1.00.  The  free  highest 
value*  come  from  the  upper  oontidano*  limit  used  e$e  point  ntii*wU, 
unity  es  an  arbitrary  chotoe,  and  an  unrealistic  Bayesian  prior. 

Recall  that  our  objective  it  to  discard  the  maximum  likelihood 
estimate  of  X for  at  ieatt  the  one  ease  tn  which  no  failures  hove  oc- 
curred and  replace  it  bv  some  suUiiwts  which  a judged  to  be  nucm- 
abiai  For  an  animate  to  be  reasonable  it  must  hew  e likely  value  end 
of  count*  logical  derivation  would  mawm  in  credibility.  Previously 
preMMitftd  arguments  have  essentially  estabtitfwd  that  the  five  highest 
values  are  derived  by  the  lean  logical  methods.  Since  then  five  era  ell 
markedly  above  the  next  lower  value,  it  is  concluded  that  we  are  jus- 
tified in  discarding  titem  from  further  consideration  and  ttwtsw  focus 
our  *lu»uon  on  the  remaining  IB  which  kuwd  ousts  everdy  from  a 
low  of  *24  toe  high  of  .St.  Seventeen  of  these  estimate*  are derived 
as  midpoints  of  confidence  intervals,  or  es  t*b«>#oietk>ns  of  i«iU  in- 
volving oonfii^r^  Jinetsand  point  ast^Vv^or  by  Bayesian  estima- 
tion methods.  Prevfaudy  proeiod  argunwms  auggset  that  each  of 
them  epprowfkw  is  tmcatty  logical 

Consider  now  the  mmwricd  values  of  tfwee  IB  replacements  for 
zero  failures.  We  have  explained  why  we  wish  to  use  a replacement 
w mmimi  than  zero  end  not  chin  unity.  The  IB  vsiuee  ell  fell  In 

a portion  of  this  range  which  ww  behave  to  be  entirety  reasonable* 
Since  we  ere  dealing  with  the  zero  fiNuia  case,  It  is  realistic  to  selects 


value  in  the  lower  half.  It  is  conservative  to  select  a value  in  the  upper 
portion  of  this  lower  half.  This  logic  suggest*  a zero  replacement  k 
satisfying  the  inequality: 


.24  < k < .51, 


the  range  between  the  extremes  of  our  IB  values.  The  choice  of  e sin- 
gle value  to  replace  the  observed  n • 0 failures  is  nwvwwHiy  arbitrary. 
Our  preference  is  to  assume  the  value  one  third,  giving  ths  zero  failure 
Ktimacor 


X • 1/3T 


This  aero  replacement  value,  1/3,  Ik  roar  the  median  of  the  18  values 
listed  in  Table  7;  it  is  *****  to  midpoint  of  the  logical  ranga  covered  by 
the  vs  turn,  and  H U certainly  easy  to  remember. 

Condusions 

It  it  our  eo'r4v^mthat  an  upper  confidence  limit  Is  an  unsatis- 
factory  substitute  for  e point  estimete  of  a constant  failure  rate,  «- 
pecielly  because  of  its  sensitivity  to  the  confidence  level.  However,  a 
more  reasonable  point  «n«i«io  can  be  developed  as  the  midpoint  of 
a confidence  interval  or  es  an  extrapolation  of  ratios  between  upper 
confidence  limits  and  point  est»HoW»  tor  cans  with  failure  events. 
Bayesian  statistical  methods  are  appropriate,  but  sensitivity  to  the 
choioe  of  the  prior  is  a serious  handicap.  We  did  not  find  sny  ree son- 
able  eubittiute  for  the  maximum  likelihood  estimator  in  cases  where 
failures  did  occur.  For  the  zero  failure  case,  wt  recommend  the 
assumption  of  one  third  of  e failure,  giving  the  failure  rate  esuT«^~ 

X • 1/3T 


The  argument!  which  we  have  presented  indicate  that  this  estimator  is 
munartcaHy  reasonable  end  that  it  is  generated  by  rather  togroti  meth- 
ods. ft  is  nMogn****  that  there  as  no  single  correct  solution  to  tills 
problem.  We  wv  merely  trying  so  replace  a maximum  likefihood  eed- 
n«L«  which  we  believe  to  br^iy  unrealistic  by  e rspiecemam 
which  is  numerically  more  acceptable  end  we  believe  that  the  suggested 
biIwimIim  Is  appro ^ 
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Technical  Appendix 


What  is  Sampling? 


Introduction 

> Sampling  is  used  in  air  ©RISK  animation  to  generate  possible  values  from  prob* 

* abiliiy  distribution  funcdor  r These  sets  of  possible  values  are  then  used  to  evalu- 

! ate  your  Excel  worksheet,  H ecaus-s  of  this,  sampling  is  the  basis  for  the 

hundreds  or  thousands  of  ‘V  hai  if”  scenarios  ©RISK  calculates  for  your  work 
sheet.  Each  set  of  samples  r tpresciRs  a possible  combination  of  input  values 
which  could  occur.  Choos:  i i a sa-ripiing  method  affects  both  the  quality  of 
your  results  and  the  length  cf  t ime  necessary  to  simulate  your  Worksheet- 

Sampling  is  the  process  by  1 hich 'ttlues  are  randomly  drawn  from  input  prob- 
ability distributions.  PiobsHJiry  distributions  are  represented  in  @RISK  by  prob- 
ability distribution  functior  r anti,  sampling  is  performed  by  the  ©RISK  program. 
Sampling  in  a simulation  is  done  repetitively — with  one  sample  drawn  every  it- 
eration from  each  input  probability  distribution.  With  enough  iterations,  the  sam-  ' ^ 

pled  values  for  a probability  dismkution  wiU  become  distributed  in  a n*">*v"r  ’^0 

which  approximates  trie  low  vr  input  probability  distribution.  The  aarigie*  of 
the  sampled  distribution — mean.  standard  deviation  and  higher  moments — 
will  approximate  the  true  smithes  that  were  input  fbr  the  distribution.  The  graph 
of  the  sampled  distribution  v ill  >swn  look  like  a graph  of  the  true  input  distribu- 
tion. 

$feri*ricians  and  practionen-  have  developed  several  techniques  for  drawing  ran- 
dom samples.  The  importarr  factor  ro  examine  when  evaluating  sampling  tech- 
niques is  the  number  of  iterr  tons  niqiiired  to  accurately  recreate  an  input 
distribution  through  samplm;-.  Accurate  resuits  for  output  distributions  depend 
on  a complete  sampling  of  input  distributions.  If  one  sampling  method  zequiics 
more  iterations  and  longer  simulation  runtimes  than  another  to  appro**™***  iupui 
distributions,  it  is  the  less  “cf TraBni"  method. 

The  two  methods  of  samplin';  used  m ©RISK  — Monte  Carlo  sampling  and 
1 j»Hn  Hypescube  sampling—  differ  in  the  number  of  iletations  required  until 
sampled  values  approximate  nrnrt  distributions.  Monte  Carlo  sampling  often  re- 
quires a large  number  of  saro  pies  to  approximate  an  input  distribution,  especially 
if  the  input  distribution  is  hi?  ily  skewed  or  has  some  outcomes  of  low  prob- 
ability. Latin  Hypercube  san  pi  in  g,  a new  sampling  technique  used  in  ©RISK, 
forces  the  samples  drawn  to  comtspond  more  closely  with  the  input  distribution 
and  thus  converges  faster  on  • he  “rue  statistics  of  the  input  distribution. 
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Sampling  Methods 
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Cumulative  Distribution 

It  is  often  helpful,  when  renewing  different  sampling  methods,  to  first  under- 
stand the  concept  of  a cumntotive  distribution.  Any  probability  distribution  may 
be  expressed  in  cumulative  form.  A cumulative  curve  is  typically  scaled  from  0 
to  1 on  the  Y-axis,  with  Y-axis  values  representing  the  cumulative  probability  up 
to  the  .^pending  X-axis  value. 


Minimum 


Xi  Maximum 


Distribution 

Veto* 

mm""  ' 


Distribution 

Value 


In  the  cum»* ** five  curve  above,  the  -5  cumulative  value  is  the  point  of  50%  cumu- 
ladve  probability  (.5  s 50%).  Fifty  percent  of  the  values  in  the  distribution  fall  be- 
low this  median  value  and  50%  are  above.  The  0 cumulative  value  is  the 
minimum  value  (0%  of  the  values  will  fall  below  this  point)  and  the  1.0  cumula- 
tive value  is  the  maximum  value  (100%  of  the  values  will  fall  below  this  point). 
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Technical  Appendix 


Why  is  this  cumulative  curvn  so  important  to  understanding  sampling?  The  0 to 
t.O  stale  of  the  cumulative  ci  ve  is  she  range  of  the  possible  random  numbers 
.generated  during  sampling.  T-.  « typical  Monte  Carlo  sampling  sequence,  the. 
computer  will  generate  a random  number  between  0 and  1 — with  any  number  in 
die  range  equally  likely  to  on  or.  This  random  number  is  then  used  to  select  a 
value  from  she  cumulative  ct’ ve.  For  the  example  above,  if  a random  number 
of  5 was  generated  during  sampling,  die  value  sampled  for  the  distribution 
shown  would  be  XI  As  the  s:  .as:w  of  the  cuxr«i*»"ve  curve  is  based  on  the 
shape  of  the  input  probability  distribution,  more  likely  outcomes  will  be  more 
likely  to  be  sampled.  The  mrt  li  lceiy  outcomes  are  in  the  range  where  the  cumu- 
lative curve  is  the  “steepest” 
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Monte  Carlo  Sampling 

Moote  Carlo  sampling  refers  to  the  traditional  technique  for  using  random  or 
pseudo-random  numbers  to  sample  from  a probability  distribution.  The  term 
“Monte  Carlo*’  was  introduced  during  World  War  II  as  a code  name  for  simula- 
tion of  problems  assori*»rd  with  development  of  the  atomic  bomb.  Today, 

Monte  Carlo  techniques  are  applied  to  a wide  variety  of  complex  problems  in- 
volving random  behavior.  A wide  variety  of  algorithms  are  available  for  generat- 
ing random  Monte  Carlo  samples  from  different  types  of  input  probability 
distributions. 

Monte  Carlo  sampling  techniques  are  entirely  random  — that  is.  any  given  sam- 
ple may  fall  anywhere  within  the  range  of  the  input  distribution.  Samples,  of 
course,  are  more  likely  to  be  drawn  in  areas  of  the  distribution  which  have  higher 
probab«i»»M  of  occurrence.  In  the  cumni«rive  distribution  shown  earlier, 

Marne  Carlo  sample  will  use  a new  random  number  between  0 and  1.  With 
enough  iterations,  Monte  r’or1n  sampling  will  “tcemra’'  the  input  distributions 
through  sampling.  A problem  of  clustering,  however,  arises  when  a small  num- 
ber of  iterations  are  performed. 
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Technical  Anoendix 


In  the  illustration  shown  hen-.  escb  (if  the  5 samples  drawn  fails  in  the  rmrfHto.  of 
the  distribution.  The  values  i:-.  the  outer  Tanges  of  the  distribution  ace  not  rep*6- 
seated  in  the  s»™i£$  and  due  Their  impact  on  your  results  is  not  included  in  your 
rimnlarion  output. 


Clustering-  becomes  especial’  pronounced  when  a distribution  includes  low 
probability  outcomes  which  c uid  have  a major  impact  on  your  results.  It  is 
important  to  include  the  effer: ::  of  these  low  probability  outcomes  and.  to  do 
ibis,  these  outcomes  must  be  .«•  wiplerl..  But  if  their  probability  is  low  enough, 
a small  number  of  Monte  Carl  > iterations  may  not  sample  sufficient  ryianrirw  of 
these  outcomes  to  accurately  represent  ihcir  probability.  This  problem  has  led  to 
the  development  of  stratified  rtmplirg  techniques  such  as  the  T min  Hypercube 
sampling  used  in  <3>RISIC 


kcumcj 
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Latin  Hypercube  Sampling 

t arm  Hypercube  sampling  is  a recent  development  in  sampling  technology  de- 
signed to  accurately  teume  the  input  distribution  through  sampling  in  fewer  it- 
erations when  compared  with  the  Monte  Carlo  method.  The  key  to  I ™ 
Hypercube  sampling  is  stnmfi<-ation  of  the  input  probability  distributions.  Strati- 
pratrinn  divides  the  cumninrive  curve  into  equal  intervals  on  the  cumulative  prob- 
ability scale  <0  to  1.0).  A sample  is  then  randomly  taken  from  each  interval  or 
“sh?.t'ficauon”  of  the  input  distribution.  Sampling  is  forced  to  represent  values 
in  each  interval,  and  thus,  is  forced  to  recreate  the  input  probability  distribution. 


In  the  illustration  above,  the  cutp"|9rive  curve  has  been  divided  into  5 intervals. 
During  sampling,  a sample  is  drawn  from  each  interval.  Compare  this  to  the  5 
clustered  samples  drawn  using  the  Monte  Carlo  method.  With  r Hypercube, 
the  samples  mac  accurately  reflect  the  distribution  of  values  in  the  input  prob- 
ability distribution. 

The  technique  being  used  during  t Hypercube  sampling  is  “sampling  without 

replacement”.  The  number  of  srra"R,~<*^ons  of  the  cun™'»»'ve  distribution  is 
equal  to  the  number  of  iterations  perforated.  In  the  example  above  there  were  5 
ORIGINAL  PAGE  IS  iterations  and  thus  5 str^fif-anons  were  made  to  the  cumulative  distribution.  A 
OJ»  POOR  QUALITY  sample  is  hum  each  aUmtificarioo.  However,  once  a sample  is  taken  from  a 

itisUificadon,  this  snsrifi«*rinn  is  not  sampled  from  »«n*»n  _ & value  is  already 
^pa^ated  in  the  sampled  set. 
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Technical  AoDenrii* 


How  doss  sampling  within  a riven  scarification  occur?  In  effect,  ©RISK 
chooses  a ^ratification  for  srnplmg  then  randomly  chooses  value  from  within 
the  selected  stratification. 


Latin  Hypercube 
and  Low  Probability 
Outcomes 


Testing  the 
Techniques 


When  using  the  Latin  Hyper  i be  nschnique  to  sample  from  multiple  variables, 
it  is  important  to  msmiain  ml  ipaaderee  between  variables.  The  values 
sampled  for  one  variable  net  ■ m in  dependent  of  those  sampled  for  another 

(unless,  of  c Cause*  you  mcoiirf  ly  ivair  rhem  correlated).  This  independence  is 
maintained  by  randomly  selecting  * — for  each  variable  — which  interval  to 
draw  a sample  from.  In  a pv".n  iijmnfion,  Variable  #1  may  be  sampled  from 
stratification  #4,  Variable  #2  'iay  i>c  rumpled  from  stratification  #22,  and  so  oil 
This  preserves  randomness  sr;ri  in  dependence  and  avoids  unwanted  correlation 
between  variables. 


Tragic 

Aircmt 

SASirux 


As  a more  efficient  sampling  nethocL  I ^tin  Hypeicube  offers  great  benefit*  in 
terms  of  increased  sampling  •:  ffictiency  and  faster  runtimes.  These  are  espe- 

cially noticeable  in  a PC  bas^v  - ritnu  tation  environment  such  as  ©RISK,  f 
Hypercubs,  however,  also  hel-ys  ihe  analysis  of  s^n^^ns  where  low  probability 
outcomes  arc  represented  in  irprar  probability  distributions.  By  forcing  the  sam- 
pling of  the  stmrl*tif>n  to  ixic  ■ • -d#  these  outlying  events*  1 *tin  Hypercube  sam- 
pling assures  they  are  ec\Uu<*‘ ! h*  ropross&ted  in  your  simulation  outputs. 


’ftTien  low  probability  oui Cuiu  are  very  implant  it  often  helps  to  run  an  analy- 
sis which  just  simulates  the  c:  irritation  to  the  output  distribution  from  the  low 
probability  events.  In  ibis  cat  The  model  simulates  only  the  occurrence  of  low 
potability  outcomes  — they  set  id  100%  probability.  Through  this  you  will 
isolate  those  outcomes  and  tin  crly  study  the  results  they  generate. 

The  concept  of  conw^ace  is  used  to  test  a sampling  method  At  the  point  of 
convergence,  the  output  dmril*  aeons  are  stable  — that  is,  ***™«nal  iterations 
will  not  markedly  change  the  rha p«  cr  statistics  of  die  sampled  distribution.  The 
sample  mean  versus  the  true  men.  o is  rrpicaily  a measure  of  convergence,  how* 
•ever,  skewness,  percentile  probabilities  and  other  statistics  are  often  used  in 
measuring  convergence. 

©RISK!  provides  a good  environment  for  testing  the  speed  at  which  the  two 
uviriiAHie  sampling  techniques  :smvwge  on  an  input  distribution.  Run  an  equal 
number  of  iterations  with  esoh  of  ihe  sampling  techniques  while  selecting  an  in* 
pm  distribution  ©function  as  r stirmtafcm  outpuL  Look  at  the  closeness  of  the 
imraple  statistics  to  the  true  sK-tistics  'vhich  were  specified  in  the  distribution 
©function.  It  should  bet  evidn  t that  Latin  Hy*w,ube  sampling  converges  faster 
on  the  true  distributions  whet?  -timpaflht  with  Monte  Carlo  sampling. 
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Sampling  Methods 


More  About  Sampling  Techniques 

The  academic  and  technical  literature  has  "ddressed  both  Monte  Carlo  and  r^tin 
Hypercube  sampling.  Any  of  the  references  to  siio"»<"'Tn  in  (he  Recommended 
Readings  Appendix  at  the  end  of  this  chapter  will  give  the  reader  an  introduction 
lo  Monte  Carlo  sampling.  References  which  specifically  address  r .-lrjp  Hyper- 
cube  sampling  are  included  in  a separate  section  of  these  references. 
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Technical  Appendix 
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Appendix  F: 

Text  of  MSFC  Incident  Reports  for  post -Galileo  Major  Incidents 
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0.02  INCH  DEEP  AND  TOO  SMALL  TO  PERFORM  METALLURGICAL 
ANALYSIS.  TO  DATE,  THE  POROSITY  IN  THE  FAILED  BLADE  FROM 
HPFTP  5602R1  IS  THE  MOST  EXTENSIVE  EXPERIENCED. 
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ENGINE  0215  HPFTP  U/K  5602R1  HAD  ACCUMULATED  61  STARTS  (661 
FLEET  LEADER)  AND  25*143  SECONDS  (64*  FLEET  LEADER).  THE  MAIN 
HOUSING  INNER  RING  HAD  TWO  TYPE  "A*  CRACKS  AND  SIX  ADDITIONAL 
CRACKS.  THE  HPFTP  TURBINE  SUPPORT  HAD  A HISTORY  OF  EXTENSIVE 
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1993  STS  Catastrophic  Failure  Frequency  Simulation 

Forecast:  93  STS  Failure  Frequency  Cell: 

Summary: 

Display  Range  is  from  O.QOE+O  to  4.00  E-2  per  Launch 
Entire  Range  is  from  1.54E-3  to  1.62E-1  per  Launch 
After  20,000  Trials,  the  Std.  Error  of  the  Mean  is  0.00 


Statistics: 

Display  Sanaa 

Entire  Ranee 

Trials 

13675 

14020 

Mean 

1.28E-02 

1.33E-02 

Median  (exact) 

1.09E-02 

1.1  IE-02 

Mode  (exact) 

1.54E-03 

1.54E-03 

Standard  Deviation 

7.29E-03 

1.01E-02 

Variance 

5.31  E-05 

1.01E-04 

Skewness 

1.24 

(unavailable) 

Kurtosis 

4.36 

(unavailable) 

Coeff.  of  Variability 

0.57 

0.73 

Range  Minimum 

0.O0E+00 

1.54E-03 

Range  Maximum 

4.00E-02 

1.62E-01 

Range  Width 

4.00E-02 

1.60E-01 

Mean  Std.  Error 

623E-05 

6. 50  E-05 

Percentiles  for  Entire  Range  (per  Launch): 

Perce  mile  93  STS  Failure  Frequency  f exact  i 


0% 

1.54E-03 

5% 

4.4SE-03 

10% 

5.38E-03 

15% 

6.12E-03 

20% 

6.83E-03 

25% 

7,496-03 

30% 

8.13E-03 

35% 

6J6E-03 

40% 

9J54E-03 

45% 

1.03E-02 

50% 

1.11  E-02 

55% 

1.20E-02 

60% 

1.29E-02 

65% 

1 .40  E-02 

70% 

1.53E-02 

75% 

1. 67E-02 

60% 

1 .88  E-02 

65% 

2.126-02 

90% 

Z49E-02 

95% 

3Z06-02 

100% 

1.62E-01 

End  of  Forecast 
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1993  ST!-  catastrophic  Failure  Freque  icy  Simulation 


Forecast:  93  STS  (Sensitivity  1)  Fallm  i :•  >*queney 


Ceil:  £21 


Summary: 

Display  Range  is  from  C-ODE-tf1  i ' * .ODE-2 

Entire  Range  is  from  2.52E-3 v ' ! ,(:i  Hi  15-1 

Attar  14,020  Trials,  the  Std.  E rr.  <r  the  Mean  is  0.00 


Statistics: 

QfccfraiB&Mft 

£atits.fi8i<:  ;.ft 

Trials 

13731 

14XT.D 

Mean 

2.156-02 

Z2BF.  2 

Median  (exact) 

1.3"  IT -02 

1 .946  2 

Mode  (exact) 

2.S5:c-03 

2.S2E-1'  3 

Standard  Deviation 

5 SITE -02 

1.336-1  2 

Variance 

t.1  f[!“-04 

1.77E  i 4 

Skewness 

1.07 

{unavniiaM  •>) 

Kurtosis 

-.1.91 

(unaveiiaM  V> 

Coeff.  of  Variability 

0.50 

Of? 

Range  Minimum 

•viKlT-f-OO 

2.S2E-i'3 

Range  Maximum 

:i  M&-Q2 

1 .666-1  1 

Range  Width 

;i  ‘)T;  kH-OS 

1.64E--I  1 

Mean  Std.  Error 

n i. ?!;•;■  as 

1.12E  ! 4 

Percentiles  lor  Entire  Range: 

.1:  g r«»n«ae-s  rSanaMvitv  11  Failure  fraggartarAxacrt 


0% 

252EC3 

5% 

&48E-C-3 

10% 

1.02E-C2 

1«% 

1.16ES-2 

SO* 

1J27E-02 

s;«% 

1.37E-02 

510% 

1.486*02 

95% 

1J59E-02 

40% 

1 .706*02 

4!?% 

1.826*02 

50% 

1.946*02 

55% 

2.066-02 

!;0% 

2516-02 

35% 

2.376-02 

70% 

2.556-02 

75% 

2.786-0  2 

i!0% 

3.046-0  2 

85% 

3.396-0  ? 

20% 

3.896-' 2 

35% 

4.77E-T2 

00% 

1.666-1 

End  ot  Forecast 


1993  STS  Catastrophic  Failure  Frequency  Simulation 

Forecast:  93  STS  (Sensitivity  2)  Failure  Frequency  Celt:  E28 

Summary: 

Display  Range  is  from  0.006+0  to  1 .7S6>1 

Entire  Range  is  from  9.83E-4  to  1.37E+0  >• 

After  14,020  Trials,  the  Std.  Error  of  the  Mean  is  0.00 


Statistics: 

Display  Range 

Entiie  flanw 

Trials 

13618 

14020 

Mean 

1.90E-02 

2.356*02 

Median  (exact) 

1.10E-02 

1.12E-02 

Mode  (exact) 

9.83E-04 

9.836*04 

Standard  Deviation 

2.32E-02 

4.96E-02 

Variance 

5.38E-04 

2.46E-03 

Skewness 

3.13 

(unavailable) 

Kurtosis 

14A4 

(unavailable) 

Coeff.  of  Variability 

122 

2.11 

Range  Minimum 

O.OOE+OO 

0.83E-O4 

Range  Maximum 

1.7SE-01 

1.37E+00 

Range  Width 

1.7SE-01 

1.37E+00 

Mean  Std.  Error 

1.97E-04 

4.196*04 

Percentiles  for  Entire  Range: 

Percentile'S  SSeneitivitv  Failure  Fmauencv  famett 


0% 

9.836*04 

5% 

331E-03 

10% 

4.136*03 

15% 

4.95E-03 

20% 

5-676-03 

26% 

6.416-03 

30% 

7.19E-03 

35% 

8.O1E-03 

40% 

8*66-03 

45% 

9*86*03 

S0% 

1.126*02 

55% 

1-256*02 

60% 

1.426*02 

65% 

1.616*02 

70% 

1.856*02 

75% 

2.18E-02 

80% 

2.67E-02 

85% 

3.366*02 

90% 

4.666*02 

95% 

7.726-02 

100% 

1.376+00 

End  of  Forecast 
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1993  378  Catastrophic  Failure  Frequency  Simulation 

Forecast:  93  STS  (Sensfivtty  3)  F<i  ! re  Frequency  Cell:  E35 


Summary: 

Display  Range  is  from  O.OC'  i 3 to  8.00E-2  per  Launch 
Entire  Range  is  from  9.006;  • \ o 1 .09E+0  per  Launch 
Alter  14,020  Trials,  the  Stc  1 mr  erf  the  Mean  is:  0.00 


Statistics: 

Lsi-feLRangf. 

JEutiUL^ancs. 

Trials 

13871 

14020 

Mean 

i .03E-02 

Median  (exact) 

->3!3E-03 

7 A M::-CI3 

Mode  (exact) 

:'t.OOE-04 

9.0'  F- 04 

Standard  Deviation 

.01 E-0S: 

2i}  F-GS! 

Variance 

..02E-04 

5.  ■ 'F-<]4 

Skewness 

2.85 

(unavn,";pi';ilm) 

Kurtosis 

13.31 

(unevp’;ebt9) 

Coeff.  of  Variability 

0.86 

i.flB 

Range  Minimum 

O.OOE+DO 

9.0  f -04 

Range  Maximum 

i).0()E*02 

i.ors-i-oo 

Range  Width 

30OE-02 

1.0f T-rflO 

Mean  Std.  Error 

&59E-05 

Percentiles  tor  Entire  Range  (per  l.imnh): 

|>  3K»ntiie  CS  (Sensthrhv  31  Mur  ^ P>?g^ncv  (exact) 


0% 

9„nnE-04 

5% 

Z1  E-03 

10% 

ainK-o® 

15% 

anrH-os 

20% 

4/-;s-o  3 

25% 

4iiCE-03 

30% 

5.CTE-03 

35% 

&STE-G3 

40% 

6.17E-C3 

45% 

6.67E-03 

50% 

7M  E-03 

55% 

8.i:e-03 

60% 

e.CiSiE-03 

65% 

9.J-J'  E-03 

70% 

1.1VE-02 

75% 

UPTE-CS 

80% 

1.40  E-0.2 

65% 

1.71:  E-C2 

90% 

2Je.:'  E-C2 

95% 

&SFE-02 

100% 

End  of  Forecast 
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1 993  STS  Catastrophic  Failure  Frequency  Simulation 


Assumptions 

Assumption:  RSR8  Pair  ^'v'  CsO:  E3 

Lognormal  distribution  with  parameters: 

M«an  7.80E-03  (=53) 

Standard  Dev.  8-28E-03  (=M3) 

Selected  range  is  from  0.005+4  to  -^Infinity 
Mean  value  in  simulation  was  7.795-3 


Assumption:  93  SSME 


Lognormal  distribution  with  parameters: 

Mean  4.68E-03  (=E5) 

Standard  Dev.  5.74E-03  (=M£) 


Selected  range  is  from  O.OOE+O  to  e-fnfinity- 
Mean  value  in  simulation  was  4.70E-3 


Cell:  E5 


U 
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1993  ST:;  Catastrophic  Failure  FrscuR  icy  Simulation 


Assumption:  93  ET 

Lognormal  distribution  with  paramei1  r: : 

Moan  I 32&04  («E7) 

Standard  Dev.  J ::S3E:“04  (=M7) 

Selected  range  is  from  O.OOE+O  to  -t  ”!  rnrly 
Mean  value  in  simulation  was  t .90E;  ■! 


Cell:  E7 


Assumption:  93  0rblter 

Lognormal  distribution  with  paramei : " ■ 

Mean  4 ICE-04  (=E0) 

Standard  Dev.  133E-04  (=M8) 

Selected  range  is  from  O.OOE+0  to  -t  n I ;n  rly 
Mean  value  in  simulation  was  4.1 1 El  - =*> 


Assumption:  83  Prelaunch 

Lognormal  distribution  with  paramoi;  ' ■ : 

Mean  r.r«E-04  (*E11) 

Standard  Dev.  • nt5E-04  (*M1 1 > 

Selected  range  is  from  O.OOE+O  to  ■+  I n ! nlty 
Mean  value  in  simulation  was  7.Q2E:  • 1 
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1993  STS  Catastrophic  Failure  Frequency  Simulation 


Assumption:  93  Prelaunch  (corn'd) 


Cell:  Ell 


Assumption:  Sensitivity  1 Ceil:  E19 

Lognormal  distribution  with  parameters: 

Mean  1.S6E-02  (=619) 

Standard  Dev.  1.20E-02  (=M19) 

Selected  range  is  from  O.OOE+0  to  ^Infinity 
Mean  value  in  simulation  was  1.66E-2 


Assumption:  Sensitivity  2 CeB:  E26 

Lognormal  distrfoution  with  parameters: 

Mean  1.S2E-02  (=E28) 

Standard  Dsv.  6.79E-02  (=M2fi) 

Selected  range  is  from  O.OOE+O  to  -t-infinity 
Mean  value  in  simulation  was  1.75E-2 


i 


I 
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1993  STS?  Catastrophic  Failure  Frequency  Simulation 


Assumption:  Sensitivity  3 

Lognormal  distribution  with  parameter  - 

Mean  ' {i.!lE:-03  (=E33) 

Standard  Dev.  8 . !! : I: •“'?  (=M33) 

Selected  range  is  from  O.OOE+O  to  -t-l  i ily 
Mean  value  in  simulation  was  6.21  E-:i 


Call:  E33 


\%m*  c.  i m iit  i »BM 


1 

LJfc-1 


End  of  Assumptions 
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Shnnto  PRA  Fhixc  i - Galileo  Study  Update 


Appendix  H: 

Comments  on  the  Differences  between  the  Galileo  Study  Results  und 

Galileo  - era  Results  in  this  Study 


Appendix  H 
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Appendix  H: 

Comments  on  the  differences  between  the  Galileo  Study  Resuits  and  Ga/r/eoera 
results  in  this  study. 

The  first  step  in  the  current  analysis  was  to  ensure  that  the  updated  failure  frequency  distributions 
resulted  only  from  additional  experience  acquired  since  the  Galileo  study,  and  not  from  the  inadvertent 
introduction  of  different  statistical  methods,  tools,  or  assumptions.  Unfortunately,  there  was 
insufficient  information  in  the  Galileo  study  report  to  exactly  duplicate  the  previously  published  results 
for  the  SRBs  and  the  SSMEs.  At  die  central  values  (mean  and  median)  of  the  system  level  failure 
frequency  distributions,  the  regenerated  values  match  very  closely  the  published  Galileo  study  values. 
Specifically,  the  Galileo  study  results  were  1/78  and  1/55  for  the  median  and  mean,  respectively.  The 
corresponding  values  in  the  current  study  were  1/74  and  1/54.  Since  the  updated  (April  1993)  results  in 
this  study  are  not  completely  consistent  with  the  earlier  results  (even  though  they  were  well  within  the 
statistical  certainty  interval  of  the  Galileo  study),  it  was  necessary  to  generate  an  intermediate  set 
Galiieo-era  results  using  the  original  assumptions  and  data,  but  using  the  same  statistical  methods  and 
tools  that  were  applied  in  this  update.  These  intermediate  results  are  therefore  entirely  consistent  with 
the  updated  results. 

Since  we  were  «nahi.e  to  perfectly  duplicate  the  previously  published  results,  it  is  impossible  to  know 
precisely  the  sources  of  the  differences  between  the  Galileo  study  calculations  and  the  calculations  used 
here.  However,  the  principal  source  of  the  discrepancy  appears  to  be  a bias  toward  preserving  the 
extreme  values  (fifth  and  ninety-fifth  percentiles)  of  lower  level  distributions  when  generating  higher 
level  distributions  (in  the  Galileo  study),  as  opposed  to  preserving  the  central  tendency  (mean)  of  a 
distribution  and  one  extreme  (as  was  done  in  this  study).  The  problem  occurs  when  aggregating 
distributions  or  combining  the  distributions  of  risk  contributors  to  generate  the  failure  frequency 
distribution  of  the  overall  system.  In  general,  the  lower  level  distributions  ray  not  be  well  behaved  or 
well  modeled  functions  amenable  to  sampling  for  the  Monte  Carlo  or  Larin  Hypercube  simnfarions. 

The  lower  level  distributions  must  therefore  be  converted  to  readily  sampled  distributions,  preserving  as 
much  information  about  the  original  distribution  as  possible.  In  general  this  involves  selecting  the  type 
of  distribution  best  suited  to  model  the  original  distribution,  and  two  points  from  the  original 
distribution  to  "anchor"  the  selected  distribution  type.  In  both  the  Galileo  study  and  the  current  study 
the  distribution  type  used  in  the  simulations  was  the  lognormal  The  analysts  performing  the  Galileo 
study  appear  to  have  selected  the  extremes  (fifth  and  niTvry-nfih  percentiles)  of  the  underlying 
distributions  to  anchor  their  lognormal  distributions,  allowing  the  central  tendencies  of  the  imrinriying 
distribution  to  "float"  to  fit  the  lognormal  distribution. 

While  the  process  of  anchoring  the  extremes  may  have  been  justified  for  the  unique  purpose  of  the 
Galileo  study,  it  was  felt  that  for  the  purpose  of  this  study,  it  was  much  more  important  that  the  central 
tendencies  and  the  worst  case  tendencies  (the  mean  and  ninety-fifth  percentiles)  be  anchored  when 
converting  distributions.  In  this  study  therefore,  all  distributions  are  generated  using  the  maximum 
livelihood  estimator  (MT  T?)  as  the  mean  (Ml  P * failures  / exposure),  and  all  distributions  are  converted 
to  lognormal  preserving  the  mean  and  the  error  fee  tor  (EF).  The  error  factor  is  determined  by  EF  = 

95th  percentile  / median  a convened  distribution  therefore  preserves  the  mean  and  the  relationship 
between  the  median  and  worst-case  end  of  the  underlying  distribution. 
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Appendix  I: 


Deteiiiiination  of  Shuttle  Catastrophic  Failure  Frequency  Using  No 
Prior  (non- Shuttle)  Knowledge  of  SRB  Failure  Frequency 


Shinto  PRA  Phase  1 - GoUtoe  Study  Update 

SSVWDD  1 


SAIC  was  acVffd  to  calc»i1»f*  the  risk  of  Shuttle  catastrophic  ascent  failure  without  using  prior  (non- 
Shuttie)  knowledge  to  determine  the  RSRB  failure  frequency  distribution.  This  analysis  was  performed 
and  is  shown  here  for  contractual  completeness.  "Sensitivity  2"  shows  the  estimated  risk  if  the  51-L 
failure  is  included  as  relevant,  and  "Sensitivity  3”  shows  the  estimated  risk  if  the  51-L  failure  is 
discounted. 


Figure  1-1.  Shuttle  Failure  Frequency  Distributions 
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These  cases  show  the  failure  frequency  distribution  for  the  RSRB  pair  (and  the  resulting  STS  failure 
frequency  distribution)  if  no  prior  knowledge  about  the  reliability  of  the  RSRB  is  assumed,  hi  these 
cases,  the  uncertainty  in  the  failure  frequency  arises  only  from  the  statistical  confidence  associated  with 
the  data  of  1 failure  in  1 10  RSRB -launches  (Sensitivity  2 - including  the  51-L  failure)  or  0 failures  in 
109  RSRB  launches  (Sensitivity  3 - discounting  the  51-L  failure).  We  believe  that  the  RSRB  reliability 
belongs  in  the  set  of  all  U.S.  solid  rockets  reliability,  and  that  the  use  of  the  solid  rocket  prior  is 
therefore  justified.  The  appropriate  distributions  for  general  use  are  therefore  the  Base  case  and 
Sensitivity  1,  depending  upon  the  extent  to  which  the  decision  maker  believes  that  design  and 
operational  changes  since  the  5 1-L  accident  have  controlled  or  mitigated  the  5 1-L  Geld  joint  failure 
mode. 
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Presentation  Viewgraphs 


Probabilistic  Risk  Assessment  of  the 
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Shuttle  PRA  Phase  1 Summary 
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f Risk  of  Catastrophic  Ascent  Failure  l 
Comparison  of  Today  with  Galileo  (STS-34) 
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Analysis  Method 
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Demonstrated  Reliability  Analysis 


Emphasis  added. 


Demonstrated  Reliability  Analysis 
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Demonstrated  Reliability  Analysis 

e.g.:  F.  Safie,  MSFC*;  R.  Biggs,  Rocketdyne 
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Results  used  by  SSME  Assessment  Team. 


Risk  Assessment 

Phase  1 - Space  Shuttle  PRA 


SSME  Analysis  - Startup 
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2.66  e-7  1 .45  e-6  2.47  e-6  7.91  e-6 

catastrophic  failures  per  ssecond  of  SSME  ourn. 
Burn  Duration  =520  seconds 
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Mean  Contribution  of  STS  Elements  to 
Risk  of  Catastrophic  Ascent  Failure 
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SSME  Test  Program 

In  the  Galileo  RTG  study,  risk  Tom  SSME  cluster  was 
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* as  long  as  failure  frequency  is  generally 
consistent  with  or  better  than  predictions. 


Comparison  of  Active  Launch  Vehicles  > 
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The  Shuttle  is  the  most  reliable  launch 
system  in  the  world. 


Conclusions 

Based  on  this  studv,  the  Space  Shuttle 


contributor  to  Shuttle  risk. 


1 988  Galileo  era  Risk  of  Ascent  Failure 
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SRB  Surrogate  and  Actual 
Failure  Frequency  Distributions 


SRB  Surrogate  and  Actual 
Failure  Frequency  Distributions 

Combining  surrogate  prior  experience  with  direct  SRB  experience  reduces 

uncertainty  in  estimated  SRB  reliability. 
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Risk  Analysis  Applied  To  The  Space 
Shuttle  Main  Engine:  ^ 

Main  Combustion  Chamber  (MCC) 
Analysis 


O 

A 


Main  Combustion  Chamber 
(MCC)  Risk  Analysis 

The  risk  analysis  of  the  MCC  used  standard  risk 
analysis  techniques  to  assess  the  contribution  of 
MCC  failure  to  the  overall  risk  associated  with 
| the  space  shuttle  main  engines.  These  methods 
included: 

Master  Logic  Diagram  (MUD).  The  MLD  is 
used  to  identify  in  a consistent,  logical,  and 
exhaustive  method  all  events  that  could  cause 
failure  of  the  MCC  in  such  a manner  credible 
that  a loss  of  vehicle  or  loss  of  mission  could 
result.  For  example,  unstable  crack  growth  is 
included  in  the  MCC  analysis  but  missing  bolts 
on  the  powerhead  and  MCC  interface  is  ex- 
cluded because  it  is  not  believed  to  be  credible. 


Initiating  Event  Identification  and  Evaluation. 
The  initiating  events  are  obtained  directly  from 
the  MLD.  Those  events  that  start  the  logical 
sequences  leading  to  MCC  failure  are  grouped 
into  a set  of  events  for  further  analysis,  identified 
as  the  initiating  events. 

Functional  Event  Sequence  Diagram  (FESD). 
The  FESD  begins  at  the  initiating  events  and 
develops  the  sequence  of  pivotal  events  that 
must  proceed  to  end  at  either  success  or  the 
MCC  failure  point.  An  example  of  a FESD  is 
given  in  Figure  I for  the  Flow  Recirculation 
Inhibitor  (FRI)  system  in  the  MCC.  All  events 
are  pivotal  in  the  sense  that  the  event  must  have 
a yes-no,  or  on-off,  type  of  output.  These  events 
are  then  quantified  by  probabilistic  analysis:  e.g., 
the  yes  output  occurs  95%  of  the  time,  the  no 
output  5%. 


! Event  Tree  Analysis.  The  chain  of  events 
; developed  during  the  FESD  process  is  placed  in 
! an  event  tree  format.  This  format  allows  the 
j sequence  of  events  to  be  quantified  as  to  the 
| contribution  of  the  MCC  to  the  overall  SSME 
risk. 

j 

I Flow  Recirculation 

Funci; ona 1 £ vent 
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from  the  list  of  initiators,  the  event  tree  develop- 
ment and  quantification,  the  sensitivity  and 
uncertainty  analysis,  and.  finally,  some  com-  i 
ments  about  the  mam  combustion  chamber  risk. 

Inhibitor  System 

Sequence  j-aqram 


I 

i 


Figure  1.  Example  Functional  Event  Sequence  Diagram 


Each  element  of  this  risk  assessment  is  discussed 
in  the  following  sections.  The  topics  flow 
naturally  from  the  Master  Logic  Diagram  devel- 
opment to  the  initiator  identification,  the  devel- 
opment of  the  initiator  frequencies,  the  construe  - 
jtion  of  the  functional  event  sequence  diagrams 
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Main  Combustion 
Chamber  Probabilistic 

i 

Risk  Assessment 

1 Master  Logic  Diagram  Analysis 

Introduction 

The  Probabilistic  Risk  Assessment  (PRA)  of  the 
Space  Shuttle  Main  Engine  (SSME)  Main 
Combustion  Chamber  (MCC)  has  proceeded  in  a 
classical  PRA  development.  The  analyses  began 
i with  the  development  of  a Master  Logic  Dia- 
gram (MLD).  In  this  development  all  potential 
causes  of  the  top  event,  loss  of  the  orbitcr,  are 
identified  by  use  of  a logic  flow  diagram  that 
captures  the  logical  operation  of  the  SSME  and 
the  interaction  of  SSME  components.  While  the 
program  began  by  examining  the  full  SSME  it 
was  quickly  focused  on  the  MCC.  as  well  as  the 
SSME  software.  The  evaluation  of  the  SSME 
software  is  the  topic  of  a different  task  and  is  not 
reported  here.  The  MLD,  having  captured  the 
logic  used  in  the  design  and  operation  of  the 
MCC,  is  evaluated  to  define  all  credible  causes 
of  a Loss  Of  Vehicle  (LOV)  event.  It  is  cntical 
to  note  that  these  initiators  are  not  equivalent  to 
CRIT-I  events.  CRIT-1  events  are  failures  that 
lead  directly  to  the  loss  of  the  engine.  Initiators 
identified  by  the  MLD  may  need  other  events  to 
| occur  simultaneously  or  in  sequence  to  have  a 
LOV  event.  Thus,  what  is  identified  in  the 
FMEA  as  a CRIT-3  event  may,  under  the  correct 
set  of  circumstances,  lead  to  a CRIT- 1 conse- 
quence. 

After  defining  the  set  of  initiators  each  indi- 
vidual initiator  is  assessed  for  further  develop- 
ment in  the  Function  Event  Sequence  Diagram 
(FESD)  task.  In  this  assessment  the 
results  of  previous  tests  and  analyses 
performed  at  Rocketdyne  and 
Marshall  Space  Flight  Center 
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grated  to  define  the  list  of  initiators  that  require 
further  development.  Those  initiators  are  input  ! 
to  the  FESD  analyses  to  identify  pivotal  events.  , 

These  events  are  then  formalized  in  an  event  tree 
analyses.  Because  of  the  primarily  structural 
nature  of  the  MCC  and  the  lack  of  mitigating 
functions  for  off-nominal  events  fault  tree 
analyses  of  the  MCC  is  limited  in  this  study. 

Master  Logic  Diagram  (MLD) 
Results 

The  development  of  the  MLD  for  the  MCC 
began  with  a thorough  review  of  the  MCC 
geometry,  flow  paths,  operations,  inputs,  out- 
puts. test  histories,  and  failure  histories.  The 
MCC  design  consists  of  an  outer  structural  jacket 
forming  the  shape  of  the  combustion  chamber 
liner  and  carrying  the  internal  pressure  and 
external  loads  from  the  interfacing  components. 

The  liner  conducts  hydrogen  coolant  in  the  axial 
direction  and  acts  as  a thermal  barrier  between 
the  jacket  and  the  combustion  gases.  It  also 
serves  as  a heat  exchanger  to  heat  the  hydrogen 
used  to  drive  the  Low  Pressure  Fuel  TurboPump 
(LPFTP).  The  LPFTP  is  not  included  in  the  j 
PRA  of  the  MCC  but  is  critical  to  the  SSME 
PRA  development.  The  coolant  is  carried  along 
slotted  channels  in  the  liner  that  are  machined 
from  a Narloy-Z  material.  The  channels  are 
closed-out  by  Electro-Deposited  copper  (EDCu) 
and  Electro-Deposited  nickel  (EDNi).  The 
copper  is  in  place  to  protect  the  nickel  from  non- 
cryogenic  hydrogen  embrittlement  effects.  The 
liner  is  supported  by  the  high  strength  (Inconel 
718)  structural  jacket  but  is  attached  only  at  the 
ends  of  the  jacket.  Structurally,  the  liner  is 
required  to  strain  out  to  contact  the  jacket,  to 
react  the  differential  pressure  load  between  the 
coolant  and  combustion  gases,  and  to  accommo-  , 
date  the  cyclic  and  thermal  racheting  strain  i 
ranges  arising  from  the  extreme  thermal  operat- 
ing environment.  The  structural  jacket  is  re- 
quired to  provide  external  support  for  the  liner 


1, 


plus  react  the  internal  combustion  pressure  loads 
as  well  as  the  thrust  and  gimbaiing  loads.  While 
the  liner  is  not  attached  all  along  the  jacket  the 
liner  motion  is  restricted  by  the  jacket. 


The  operating  environment  for  the  MCC  is 
severe.  Before  the  SSME  firing  the  entire  liner 
is  approximately  -400  °F.  During  steady  state 
j operation  the  hot  gas  wall  of  the  liner  is  approxi- 
mately 1.100  °F  while  the  coolant  side  near  the 
jacket  is  - 150  °F.  Near  the  throat  section  of  the 
MCC  the  coolant  pressure  is  6,300  psi  while  the 
; hot  wall  chamber  pressure  is  2,100  psi.  The 
coolant  channel  height  (measured  radially)  is 
approximately  0.1  inch  which  implies  that  a 
1.250  °F  temperature  gradient  exists  over  thick- 
ness equivalent  to  the  thickness  of  a quarter. 
This  temperature  differential  also  introduces  a 
thermal  strain  mismatch  at  the  Narloy-z/copper/ 
nickel  interfaces. 


During  detailed  discussions  with  the  SAIC  and 
MSFC  engineers  and  two  on-site  meetings  the 
MLD  given  in  Figure  2 was  agreed  upon.  This 
MLD  is  used  in  the  following  section  to  identify' 
; those  initiators  for  use  in  the  FESD  and  event 
I tree  development. 
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unstable.  Therefore,  even  when  an  event  is 
listed  as  an  initiator  it  must  be  kept  in  mind  that 
such  initiators  are  dependent  on  time,  the  history 
of  the  MCC,  and  the  mission  requirements. 

Given  these  qualifiers  the  list  of  initiating  events 
is  shown  in  Table  I. 


Initiators  Identified  from  MIJ)  m%m j 

vy  , JHiot  Gds  Wall  Crack  ' ; , 

Coolant  Channel  Crack 

v?V, 

EDNi  Separation/Crack 

di^jainnel  Blockage:  S-ijj 
FRI Leakage 

Manifold  Weld  Failure •.  . 'N'-jy.;;; 
Actuator  Sideload  Instability 
WC\  Lp|spfPqWerhead  Bolt  PreipadjiPjl 
Bent  Nozzle  Tube  at  MCC  Interface 
Bill  If  Comlwstion/Flow  BisitabiUfyi||Sll 
Loss  of  Pressure  Sensor 


; '>  • 5 j ij: 


Table  I.  Initiators  Identified  From  MLD 


Initiator  Evaluation 


List  Of  Initiators  and 
Examination  for  FESD 
Development 


The  results  of  the  MLD  evaluation  identified  a 
set  of  initiators  that  can  credibly  lead  to  the  LOV 
event  from  failures  in  the  MCC.  The  list  of 
initiators  is  comprehensive  and  to  the  extent 
possible  exclusive.  It  is  important  to  note  that 
there  are  overlapping  physical  conditions  that 
can  cause  one  "initiator"  to  appear  on  the  event 
sequence  of  another  initiator.  For  example, 
blockage  of  several  coolant  channels  is  an 
initiator  that  can  lead  to  a LOV  event.  U can  be 
the  case  that  it  is  coupled  with  cracks 
in  the  hot  gas  w'all  that  under  normal 
cooling  conditions  are  not  CRIT-1 
events  but  coupled  with  large  thermal 


Hot  2 as  wall  crack. 

If  the  hot  gas  wall  crack  is  large  enough  then 
the  MCC  failure  will  be  immediate  and  cata- 
strophic. However,  there  is  sufficient  evi- 
dence from  previous  MSFC  data  to  indicate 
that  the  MCC  can  withstand  substantial  crack- 
ing without  catastrophic  failure.  There  are 
known  instances  of  pinhole  and  small  cracks 
that  have  not  caused  catastrophic  failure.  A 
specific  example  in  which  a MCC  survived  J 
with  37  inches  (cumulative  sum  of  ail  cracks,  . 
not  a single  crack  length)  has  been  docu- 
mented. However,  since  the  mechanism  for  j 
the  crack  stopping  is  not  well  understood  the 
event  "crack  stops"  or  "crack  is  stable"  cannot 
be  assessed  a 100%  probability.  Therefore, 
further  investigation  of  this  sequence  of  events 


2-2 
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j is  warranted. 

Coolant  Channel  deformation/cracst 

If  the  MCC  hot  gas  wail  crack  is  expected  to 
stop  growing  due  to  a reduction  in  thermal 
stress  from  H,  leaking  through  the  crack  then 
it  is  not  possible  to  neglect  the  effect  of  the 
deformation  or  cracking  of  coolant  channels. 

If  reduced  or  lost  flow'  in  a coolant  channel 
occurs  because  of  deformation  or  cracks  then 
a localized  hot  spot  can  develop  near  a crack 
hot  gas  wall.  In  this  case  the  thermal  stress 
may  induce  large  crack  growth  and  the  initia- 
tor must  be  developed  in  the  FESD  and  event 
tree  analysis. 

G-15  bolt  failure 

There  is  evidence  from  the  MSFC  test  stand  data 
that  a single  bolt  failure  is  not  a catastrophic 
event.  In  the  development  of  the  event  tree 
this  should  be  accounted  for  by  a separate  path 
assuming  that  a report  referencing  this  data  is 
made  available. 

EDNi  closeout  seoaration/crack 

The  failure  of  the  EDNi  closeout  has  been 
assumed  to  be  negligible  because  the  hot  gas 
wail  is  in  (.primarily)  a compressive  stress 
state.  However,  at  the  interface  of  the  Nariov- 

3 

Z.  copper,  and  nickel  there  are  non-negiigible 
shear  forces  because  of  the  dissimilar  materi- 
als. The  mismatch  in  shear  modulus, 

Poisson's  ratio,  and  thermal  expansion  while 
small  still  introduces  shear  forces.  The  extent 
of  these  shear  forces  is  of  concern.  Also, 
given  a shear  force,  the  frequency  of  the  EDNi 
failure  is  of  interest.  If  it  can  easily  be  shown 
that  the  shear  forces  will  not  lead  to  a failure 
rate  of  more  than  1 in  10,000  per  engine  per 
tight  then  the  overall  contribution  to  the  risk 
will  be  so  small  that  the  pursuit  of  this  failure 
path  is  not  important. 

There  are  two  important  failure  paths 
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cemed  with  in  the  consideration  of  the  coolant 
channel  closeout  failure.  The  first  is  labeled  j 
sub-interface  failures  and  the  second  is  labeled  \ 
interface  failures.  Either  type  of  failure  path 
can  be  initiated  by  a variety  of  processes: 

Manufacturing  defects  j 

Voids  in  the  materials 

Fatigue 

Thermal  racheting 
Creep 

Of  course  all  of  these  may  interact  to  produce 
early  failures.  For  this  initial  scoping  effort  it 
is  assumed  that  a defect  exists.  The  question 
is  then:  What  is  the  stress  state  and  the  poten- 
tial for  defect  propagation  given  that  the 
defect  exists? 

The  answer  to  this  question  involves  complex, 
detailed  analyses  that  are  time  consuming  to 
perform.  However,  it  is  possible  to  assess  the  j 
stress  slate  in  a somewhat  simplified  analysis  { 
to  determine  if  the  shear  stress  is  important  to 
the  potential  failure  of  the  EDNi/Narioy-z 
closeout. 

It  is  important  to  emphasize  that  the  shear  effect 
is  important  because  of  the  dissimilar  material 
bond.  It  has  been  shown  that  a defect  in  a 
combined  shear  and  compressive  stress  field 
can  exhibit  Mode  I (tensile),  as  well  as  Mode 
II  (shear)  crack  growth  depending  on  the 
materials  and  crack  orientation.  For  a defect 
in  the  interface  the  crack  acts  as  a ‘’bubble”  in 
which  the  effect  of  increasing  the  compressive 
force  is  to  increase  the  crack  growth  rate  - 
exactly  the  opposite  effect  of  what  is  expected 
from  single  material  crack  growth  analysis. 

To  assess  the  effect  of  the  MCC  stress  state  on 
interface  and  sub-interface  crack  growth  rates 
two  tasks  must  be  considered.  First,  an 
anlysis  of  the  frequency  of  debond  failures  in 
the  MCC  liner  must  be  performed.  If  the 


frequency  of  debond  is  high  enough  then  it  is 
necessary  to  perform  some  stress  analyses  of 
I the  dissimiliar  material  interface  area  to 
! determine  if  the  area  is  suceptibic  to  larger 
| crack  growth  rates  than  was  previously  ex- 
j pected. 
i 

The  basic  conclusion  is  that  the  EDNi  and 
Narloy-z  interface  must  be  studied  in  more 
detail.  The  following  sections  provide  the 
debond  rate  data  analysis  and  simplified  stress 
analyses. 

Multiple  coolant  channel  blockage 

Because  of  the  concern  over  the  thermal  loading 
of  the  MCC  liner  wall,  the  closeout,  and  the 
hot  gas  wall  cooling  via  thermal  stress  reduc- 
tion from  cooling,  the  possibility  of  coolant 
channels  becoming  blocked,  even  partially, 
could  affect  the  fracture  behavior  or  crack 
growth  characteristics  in  these  other  areas. 
While  a priori  it  is  expected  that  these  events 
will  have  very  little  chance  of  causing  a loss 
of  MCC  event  they  must  still  be  examined. 

FRI  system  failure 

1 The  FRI  system  protects  the  MCC  interface 
with  the  nozzle  from  exhaust  gas  re-circula- 
tion during  mainstage  firing.  The  increased 
thermal  stresses  from  such  a re-circulation 
pattern  could  affect  the  nozzle  tubes,  the  MCC 
liner  turn  around  duct,  or  the  MCC  to  nozzle 
bolts.  This  initiator  must  be  included  in 
functional  event  sequence  diagram  analyses. 

Manifold  weld  failure 


Clealy,  the  failure  of  the  manifold  weld  is  of 
critical  concern.  It  is  believed  that  the  mani- 
fold weld  failure  caused  a catastrophic  failure 
of  an  engine  during  testing.  The  evidence  was 
not  100%  conclusive  because  of  the  large 
scale  destruction  of  the  engine. 

* However,  even  a belief  that  the  weia 
failure  could  have  lead  to  a loss  of 
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requires  that  it  be  included  in  further  risk 
assessment  studies. 

Actuator  sideload  instability 

MSFC  engineers  have  stated  that  the  force  that 
the  actuator  develops  during  any  flight  is 
insufficient  to  cause  a buckimg  of  the  MCC. 

This  analysis  was  carried  out  by  Rocketdyne 
and.  pending  the  receipt  of  this  report,  it  is  not 
considered  further. 

Loss  of  powerhead  bolt  preload 

j 

There  is  some  recent  test  stand  data  that  indi- 
cates that  the  MCC  could  survive  the  loss  of  a 
single  bolt.  This  will  certainly  reduce  the 
frequency  with  which  one  reaches  a loss  of 
MCC  event  from  this  initiator  but  since  it  is 
not  known  if  there  is  a zero  probability  of  the 
loss  of  the  MCC  this  initiator  must  be  evalu- 
ated by  FESD  analysis. 

Bent  nozzle  tubes  at  MCC/no2zle  interface 

After  meeting  with  MSFC  engineers  it  was 
concluded  that  this  initiator,  while  near  the 
MCC  and  nozzle  interface,  was  outside  the 
scope  of  a risk  assessment  of  the  MCC. 

Therefore,  this  initiator  was  not  considered 
further  because  it  is  out  of  scope. 


Combustion/flow  instability  I 

j 

During  meetings  with  MSFC  and  by  careful 
consideration  of  the  MLD  it  was  concluded 
that  combustion  or  flow  instabilities  are  not 
true  intiators  but  rather. are  the  result  of  some 
other  initiating  event.  They  will  be  pivotal 
events  in  the  FESD  construction,  however,  the 
combustion  or  flow  instabilities  result  from  j 

causes  outside  the  MCC  or  from  other,  already  i 

i 

I 
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identified,  initiating  events  within  the  MCC 
i boundaries. 

Loss  of  pressure  sensor 

When  the  sequence  of  events  after  the  loss  of  the 
pressure  sensor  are  examined  it  becomes  clear 
that  there  arc  no  events  within  the  MCC  as  a 
result  of  this  sensor  failure,  ff  the  sensor  be- 
comes plugged  then  it  is  within  the  bounds  of  the 
MCC  but  all  events  occur  at  the  controller  and 
: turbomachinery.  Therefore,  the  loss  of  pressure 
sensor  was  not  included  during  the  FESD  devel- 
opment. 

Based  on  these  examinations  there  are  seven 
initiators  which  warrant  more  detailed  investiga- 
tion in  the  functional  event  sequence  diagram 
analysis.  These  are  listed  in  Table  II.  Before 
proceeding  to  the  FESD  construction  the  fre- 
quency with  each  initiator  occurs  is  estimated 
using  the  existing  data  from  MSFC  tests  and 
orbiter  flight  data  bases. 

Summary  and  Conclusions 

The  evaluation  of  the  frequency  of 
initiating  and  pivotal  event  frequencies  indicated 
: that  the  debond  and/or  cracking  of  the  Electro- 
Deposited  Nickel  (EDNi)  close-out  layer  of  the 
MCC  liner  is  occurring  more  frequently.  There 
are  difficulties  in  evaluating  these  data.  First, 
there  arc  relatively  few'  failures  of  the  MCC  and, 
therefore,  the  associated  uncertainties  are  large. 
Second,  when  there  are  catastrophic  failures  of 
the  MCC  the  design  is  usually  changed  to  re- 
move these  sources  of  failure.  In  this  case  it  is 
necessary’  to  discount  (i.e.  count  them  as  less 
than  unity  probability  of  occurrence)  previous 
failures  or  show'  through  analysis  how  these 
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design  changes  or  operational  changes  have 
affected  the  MCC  performance.  To  perform 
such  analysis  it  is  necessary  to  understand  the 
MCC  construction  and  the  function  of  the  liner. 
To  show  how  physical  and/or  phenomenological 
models  are  used  to  quantify  the  failure  frequency 
of  initiating  or  pivotal  events  the  EDNi  layer 
debond  is  investigated. 


Imd^orilndtided^  FESD  Analysis ; 


Cracks 

Coolant  Channel  Cracks 
&dlantC^ 


Manifold  weld  failure 


Table  II.  Initiating  Events  For  Functional 
Event  Sequence  Diagram  Analysis 


Stress  Analyses  and 
Crack  Growth: 

Application  To  The  MCC 
i Liner 

The  SSME  MCC  configuration  and  cross-section 
are  shown  in  Figures  3 and  4.  The  MCC  design 
consists  of  an  outer  structural  jacket  forming  the 
shape  of  the  combustion  chamber  iiner  and 
carrying  the  internal  pressure  loads  and  the 
external  loads  from  interfacing  components.  The 
liner  is  attached  to  the  jacket  at  the  ends  of  the 
} structure.  The  liner  is  made  of  Narloy-Z  with 
coolant  channels  machined  into  the  liner  in  the 
axial  direction.  The  coolant  channels  are  closed 
out  by  a thin  copper  layer  to  protect  the  nickel 
material  from  hydrogen  embrittlement.  The  liner 
is  supported  by  a high-strength  (Inconel  718) 
jacket  that  restricts  liner  motion  during  engine 
operation.  Thus,  although  the  liner  is  not  at- 
tached to  the  jacket  all  along  the  MCC,  its 
motion  is  restricted.  During  steady  state  opera- 
tion the  liner  hot  gas  surface  is  nominally  at 
I l,100°F  and  the  back  wall  on  the  jacket  side  is 
typically  -150°F.  During  start  and  cutoff  the 
complete  liner  temperature  reaches  -400°F.  Near 
) the  MCC  throat  area,  the  hot  wall  chamber 
pressure  is  approximately  2,100  psi,  while  the 
internal  pressure  of  the  coolant  hydrogen  is 
6,300  psi. 


accumulation  process  in  the  MCC 
liner  has  been  previously  analyzed 
as  composed  primarily  of  creep 
and  thermal  racheting.  The  effort 
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in  the  past  has  focused  on  wall  thinning  and 
crack  growth  in  the  hot  gas  wall  side  of  the  liner, 
because  there  has  been  test  data  taken  in  which 
the  hot  gas  wall  has  developed  through-wall 
cracks. 

The  failure  of  the  EDNi  closeout  has  been 
assumed  to  be  negligible  because  the  hot  gas 
wall  is  (primarily)  in  a compressive  stress  state. 

Under  pressure  only  loading  conditions,  this  is 
valid.  In  fact,  for  a single  material,  under  both 
pressure  and  temperature  loading,  the  stress  field  j 
may  be  primarily  compressive.  However,  at  the  j 
interface  of  the  Narloy-Z,  copper,  and  nickel, 
there  are  non-negligible  shear  forces  because  of 
the  dissimilar  materials. 

The  mismatch  in  shear  modulus,  Poisson’s  ratio, 
and  thermal  expansion,  while  small,  still  intro- 
duces shear  forces.  The  extent  of  these  shear 
forces  is  of  concern.  Also,  given  a shear  force, 
the  frequency  of  the  EDNi  failure  is  of  interest. 

If  it  can  easily  be  shown  that  the  shear  forces 
will  not  lead  to  a failure  rate  of  more  than  one  in 
50,000  per  engine,  then  the  overall  contribution 
to  the  risk  will  be  so  small  that  the  pursuit  of  this 
failure  path  is  not  important. 

There  are  two  important  failure  paths  to  consider 
regarding  the  coolant  channel  closeout  failure: 
sub-interface  failures  and  interface  failures. 

Either  failure  path  can  be  initiated  by  a variety  of 
processes: 

Manufacturing  defects 

Voids  in  the  materials  i 
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Fatigue 

Thermal  racheting 
Creep 

Of  course  all  these  may  interact  to  produce  early 
failures.  For  this  initial  scoping  effort  it  is 
assumed  that  a defect  exists.  The  question  is 
then:  What  are  the  stress  state  and  potential  for 
defect  propagation,  given  that  the  defect  exists? 

The  answer  to  this  question  involves  complex, 
detailed  analyses  that  are  time  consuming  to 
perform.  However,  it  is  possible  to  assess  the 
stress  state  in  a simplified  analysis  to  determine 
if  the  shear  stress  is  important  to  the  potential 
failure  of  the  EDNi/Narioy-Z  closeout. 

It  is  important  to  emphasize  that  the  shear  effect 
is  significant  because  of  the  dissimilar  weld.  It 
has  been  shown  that  a defect  in  a combined  shear 
and  compressive  stress  field  can  exhibit  Mode  I 
(tensile),  as  well  as  Mode  II  (shear)  crack  growth 
depending  on  the  materials  and  crack  orientation. 
For  a defect  in  the  interface,  the  crack  acts  as  a 
"bubble”  in  which  the  effect  of  increasing  the 
compressive  force  is  to  increase  the  crack  growth 
rate  --  exactly  the  opposite  effect  of  what  is 
expected  from  single  material  crack  growth 
analysis. 

To  assess  the  effect  of  the  MCC  stress  state  on 
interface  and  sub-interface  crack  growth  rates,  a 
simplified  stress  analysis  has  been  performed. 
This  analysis  examines  a realistic  MCC  geom- 
etry and  calculates  the  stress  in  the 
liner  cavity.  The  details  of  this 
analysis  are  provided  in  the  follow- 
ing section. 
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Stress  Analv$is  o f MCC  Liner 

The  stress  analysis  of  the  MCC  liner  includes  the 
geometry  shown  in  Figure  4.  This  geometry'  is 
analyzed  to  determine  the  stress  state  near  the 
interface  of  the  EDNi  closeout  of  the  Narioy-Z 
finer  material.  To  provide  a realistic  approxima-  ' 
tion  to  the  actual  stress  state,  both  the  thermal 
and  mechanical  stress  states  must  be  calculated. 
Because  of  the  approximate  nature  of  this  analy- 
sis, previous  thermal  analysis  of  the  MCC  will 
be  used  to  define  the  temperature  profile  in  the 
MCC  liner.  The  temperature  profiles  are  ob- 
tained from  references  [1]  and  [2].  The  tempera- 
ture profile  induces  thermal  stresses  in  the 
Narloy-Z  material,  which  is  where  the  primary 
heat  transfer  occurs.  The  cold  wall  side  of  the  j 
liner  also  has  a temperature  field  profile  although 
the  gradient  is  substantially  smaller  than  on  the 
hot  wail  side.  Figure  5 shows  the  temperature 
profile  as  calculated  in  reference  [1],  The  profile 
shown  in  Figure  5 is  for  the  area  of  the  liner 
approximately  one  inch  upstream  from  the 
throat.  The  temperature  profile  is  assumed  to  be 
the  same  for  the  liner  at  all  axial  points,  except 
the  wall  boundary  condition.  Thus,  a simple 
ratio  is  used  to  determine  the  liner  distribution 
throughout  the  liner  channel  at  locations  away  j 
from  the  throat.  When  both  temperature  and 
pressure  loading  are  considered  simultaneously, 
then  there  is  a non-negligible  shear  stress  intro- 
duced near  the  interface  layer. 

The  Narloy-Z  material  is  strong  enough  to 
withstand  this  combined  stress  field,  and  there 
are  many  thousands  of  seconds  of  test  data  to 


prove  this  claim.  The  area  of  greatest  concern 
comes  from  the  introduction  of  a defect  at.  or 
near,  the  interface  of  the  Narloy-Z  and  nickel. 
The  introduction  of  a crack-like  defect  causes 
dii rerent  behavior  and  stress  loading  on  the  crack, 
tips  due  to  the  dissimilar  material  effect.  An 
important  effect  has  been  observed  for  a sub- 
interface  crack  with  crack  face  contact  zone  in  a 
combined  compressive  and  shear  field'31.  In- 
creasing the  level  of  the  compressive  stress  may 
result  in  an  increase  of  the  stress  intensity  factor 
K,  at  one  of  the  crack  tips.121  The  actual  value  of 
the  stress  intensity  factor  will  depend  on  many 
parameters  including  the  elastic  modulus, 
Poisson’s  ratio,  the  distance  from  the  near  crack 
tip  to  the  interface,  the  orientation  of  the  major 
crack  axis  compared  to  the  interface,  and  the 
normal  and  shear  stress  levels.  Thus,  there  is  a 
need  to  demonstrate  that  the  stress  field  does 
contain  shear  forces  and  estimate  this  effect  on 
the  crack  behavior. 


I To  estimate  the  normal  and  shear  stress  fields  in 
the  material,  a simplified,  but  realistic,  finite 
element  analysis  of  the  MCC  liner  was  under- 
taken. The  geometry  for  this  analysis  is  assumed 
to  be  axisymmetric  as  shown  in  Figure  6.  The 
boundary  conditions  for  the  analysis  are  a com- 
bined pressure  and  temperature  loading.  The 
two  layers  modeled  were  of  Narloy-Z  and  nickel. 


The  elastic  modulus,  Poisson’s  ratio,  and  thermal 
expansion  coefficient  are  all  functions  of  tem- 
perature.4!  For  this  study,  the  effect  of  the 
thermal  variation  in  material  properties  was  not 
included  to  limit  the  analyses  to  a 
linear  problem.  Thus,  the  material 
I property  data  used  was  for  the 
I temperature  condition  at  the  MCC 
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throat.  By  solving  the  linear  problem  the  prin- 
ciple of  superposition  can  be  used  to  combine 
the  pressure  and  temperature  conditions  due  to  ' 
changing  power  levels. 

Figure  7 shows  the  root  mean  square  total  strain 
state  for  a typical  analysis.  The  label  “prob: 
shear-03’'  in  Figure  7 implies  that  both  the  MCC 
pressure  and  temperature  field  have  been  im- 
posed on  this  problem.  The  pressure  is  that  of 
steady-state  operation.  Figure  8 shows  the 
results  of  the  stress  analysis  when  only  the 
pressure  field  is  applied.  Because  of  the  calcula- 
tion procedure  used  in  the  COSMOS*  finite 
element  package,  the  “shear”  stress  reported  is 
the  stress  in  the  x-y  coordinate  system.  Since  we 
are  interested  in  the  interface  layer  stress,  a post- 
processing program  was  developed  to  change 
coordinate  systems  and  obtain  the  interface  shear 
stress.  This  result  is  shown  in  Figure  10.  As  this 
Figure  indicates,  the  shear  force  at  the  interface 
is  not  negligible  and  must  be  accounted  for  in  the 
crack  growth  analysis. 

To  estimate  the  effect  on  the  crack  growth,  the 
results  of  calculations  by  Yang  and  Kim!5!  are 
used.  In  this  analysis  the  stress  intensity  factor, 

K,,  is  calculated.  One  of  the  resuits  is  the  esti- 
mate of  K,  as  a function  of  the  ratio  of  the  nor- 
mal stress  to  the  shear  stress  C...  a . For  the 

iV  t 

nickel  and  Narloy-Z  material  properties  of 
interest,  the  Dunder’s  parameters  given  in 
reference  [4]  are  most  closely  approximated  by 
of  0.4  and  of  0.125.  Figure  10  shows  a plot  of 
Kj  normalized  by  the  stress  intensity  factor  for  a 
crack  in  an  infinite  plate  versus  CyOT.  As  this  J 


Figure  indicates,  even  for  compressive  stresses 
(negative),  the  value  of  Kj  can  be  as  high  as  60% 
of  the  stress  intensity  factor  for  the  crack  in  an 
infinite  piate  under  tensile  loading. 

The  crack  growth  rate  is  given  by 
where  C and  m are  material  constants,  N is  the 
number  of  applied  stress  cvcies,  and  K is  the 
range  of  the  stress  intensity  factor.  If  a value  of 
1 x 10'5  is  used  for  C and  4 for  m , then  we  can 
calculate  the  stress  level  necessary  to  double  the 
crack  length  over  one  cycle.  If  the  initial  crack 
size  is  0.005  inches  then  it  will  double  in  size  if 
the  stress  level  is  approximately  35  ksi.  If  the 
crack  is  half  the  width  of  the  land  then  it  will 
double  if  the  stress  is  25  ksi.  These  must  be 
viewed  only  as  estimates  because  the  stress 
levels  are  outside  the  linear  region  and  therefore 
equation  ( 1)  is  not  accurate.  Also,  a doubling  of 
the  half-width  crack  would  require  that  edge 
effects  be  incorporated.  Given  these  caveats, 
these  stress  ranges  are  within  ranges  calculated 
in  a previous  Rocketdyne  report.16* 

Summarv  and  Conclusions 


The  MCC  liner  has  been  investigated  for  fatigue 
and  crack  growth  at,  or  near,  the  Narloy-Z  and 
EDNi  interface.  It  was  found  that  sufficient 
shear-to-compressive  stress  ratios  exist  to  cause 
defects  to  grow  during  steady  state  operation  due 
to  low  cycle  fatigue.  The  stress  analysis  indi- 
cates that  K(  can  be  as  large  as  60%  of  the  K, 
value  under  normal  tensile  loads  in  an  infinite 

plate  even  when  the  hot  gas  wall  is 
in  compression.  Local  hot  spots, 
throttle  down,  and  throttle  up  can 
cause  a change  in  the  crack  length. 
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As  with  ail  material  fracture  the  larger  the  initial 
crack  size  the  larger  the  growth  rate. 

1 

From  a risk  standpoint,  the  growth  of  cracks  in 
the  closeout  area  is  of  concern  if  the  inspection 
for  these  cracks  is  inefficient  and  if  the  leak  rate 
is  large  enough  to  deform  the  divergent  section 
of  the  nozzle.  Efficient  inspection  procedures 
can  substantially  reduce  the  risk. 

To  fully  integrate  the  effect  of  the  EDNi  closeout 
failure  on  MCC  risk  requires  that  the  initial 
distribution  of  defects,  due  to  either  debond  or 
cracking,  is  quantified.  From  the  SSME  data- 
base it  is  possible  to  define  the  frequency  of  the 
defect  rate.  An  assumption  about  the  size  of 
these  defects  is  needed  to  fully  quantify  how' 
many  of  these  defects  can  cause  an  initiating 
event  for  the  failure  of  the  MCC  and  LOV  event. 
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Figure  3.  Space  Shuttle  Main  Engine  Main  Combustion  Chamber  Configuration 
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Coolant  Channel  Thermal  Distribution  (110%  FPL  hg) 
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Figure  5.  Main  Combustion  Chamber  Liner  Temperature  Profile 
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Calculated  Shear  Strains  For  MCC  Liner  Wall 


Figure  9.  Main  Combuslion  Chamber  Liner  Interface  Shear  Stress  Profile 

Mode  I Stress  Intensity  Factor  Versus 
Normal  To  Shear  Stress  Ratio 


■ Htoha  - +A  frta  *4.125 
* Alpha  * 0.4  Bel*  • *.  125 


1.7 
OX 
0 1 


/ 

// 

// 


S 


// 

/ 


At 

A 


•0.4  AU 

Normal  to  shear  !trtu  ratio 


I 

U 


1 

Uj6 


Figure  10.  K,  as  a Function  of  the  Ratio  of  the  Normal  to  Shear,  ct./0( 
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Initiating  Event 
Frequency  Estimation 

INTRODUCTION 


Classically,  most  data  analyses  are  oniy  con- 
cerned with  failure  data  and  failure  rates.  Be- 
cause this  is  a demonstration  project  it  must 
remain  clear  that  an  initiating  event  does  repre- 
sent a failure  of  a sub-component.  However,  it 
does  not  imply  the  failure  of  the  MCC.  There- 
fore, in  this  chapter  we  will  refer  to  sub-compo- 
nent failures,  e.g.  hot  gas  wall  cracks,  as  anoma- 
lies rather  than  failures.  Then  when  one  sees  an 
anomaly  rate  it  will  not  be  mistaken  for  an  MCC 
failure  rate. 


This  study  proceeded  in  two  phases.  The  data 
received  by  S AIC  contained  data  from  January 
1983  through  April  of  1993.  The  first  step  in  the 
data  analyses  is  the  examination  of  the  data  as  it 
exists  to  estimate  initiating  event  frequencies  and 
pivotal  event  frequencies.  An  examination  of 
the  raw  data  also  helps  to  provide  closure  to  the 
MLD  analyses  since  any  events  not  previously 
considered  should  appear  in  the  data  base  if  they 
are  truly  important.  Therefore,  the  first  part  of 
this  chapter  provides  an  overview  of  the  methods 
and  analyses  performed  on  the  SSME  data  as 
received.  The  second  portion  of  this  chapter 
then  re-organizes  the  data  into  a form  more 
suited  for  the  event  tree  analyses  to  be  performed 
later. 


PRACA  Data  Base  Analyses 


The  anomaly 


NASA 


k/ 


data  (both  test  and  flight)  for  MCC 
(Main  Combustion  Chamber)  from 
1/14/1983  to  4/6/1993  have  been 
studied.  Because  of  the  limited 
data  base,  some  assumptions  are 


necessary':  First,  since  the  successful  test  data 
between  the  anomalies  arc  not  available  at  the 
present  time,  the  accumulated  MCC  testing  time 
for  each  year  are  assumed  to  be  same.  Second, 
the  environments  for  different  MCC  tests  such  as 
Qualification/certification  test,  Alert,  develop- 
ment test,  in-flight,  acceptance  test  and  manufac- 
turing are  not  discriminated  in  this  study.  Third, 
anomalies  caused  by  different  anomaly  modes 
are  assumed  to  have  the  same  consequence. 
Based  on  these  assumptions,  the  MCC  anomalies 
are  categorized  into  nine  anomaly  modes.  The 
contributions  to  the  MCC  anomaly  made  by  each 
anomaly  mode  are  estimated.  By  applying  the 
basic  concept  of  “AGREE  Allocation  Method” 
r?l.  the  anomaly  occurrence  rates  of  each  MCC 
anomaly  mode  are  also  calculated. 

MCC  Anomaly  Modes 

There  are  eight  anomaly  modes  in  the  original 
MCC  data  base.  They  are: 

ET : Measurement  Anomaly 

EV:  Not-To-Specification 

MS:  Structure 

MT:  Pressure/Temperature  High  or  Low 

MU:  Mechanical  Tolerance 

MV:  External  Leak 

MW:  Internal  Leak 

UC:  Unsatisfactory  condition 

A large  portion  of  MCC  anomalies  are  related  to 
contamination,  blanching,  and  surface  roughness 
which  are  not  identified  as  initiating  e vents  from 
the  MLD  analysis.  Also,  many  inconsistencies 
exist  in  categorizing  anomaly  modes  in  the 
original  MCC  data  base.  For  example,  anoma- 
lies caused  by  material  crack  were  placed  in 
anomaly  category  UC  from  1983  to  1985,  but  in 
anomaly  mode  missing  copper  or  debond  from 
1986  to  1988.  Therefore,  the  original  data  base 
is  re-categorized  into  the  following  nine  groups: 
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. . ;B  r,ng  Crack  . . • 7^2^ -Missing  ^ Weld  ?•£-.„  . 

YLAR  ?.:  or  surlacc  - . . v Cunumtnation  - Internal  and  TOTAL- 


YtA*  or  “c  pinhole  copper  or  either  High  tolera  nee  Individual  -litilurc  •,w*'u' 


Table  III.  Number  of  Failures  Per  Year  For  SSME  MCC 


BS:  Blanching/Surface  Roughness 

CK:  Crack  and  Pin  Hole  (Channel,  24%; 

Liner,  22%;  Weld,  27%;  Hot  Gas 
Wall,  11%) 

CT:  Contamination 

LK:  Leak  - internal  and  external  (Burst 

Diaphragm  leakage,  54%) 

MS:  Structure  (Missing  Copper,  44%;  De- 

bond, 39%) 

MT:  Pressure/Temperature  Hi  or  Low 

MU : Mechanical  T olerance 

WL:  Weld  Anomaly  (not  including  crack) 

Rt:  Random  Individual  Anomaly 

MCC  Anomaly  Mode 
Contribution  Estimation 

Based  on  these  new  categories  the  number  of 
anomalies  for  each  anomaly  mode  during  each 
year  are  listed  in  Table  III. 

The  average  annual  anomaly  number  (anomalies/ 
year)  caused  by  different  anomaly  modes  are 
estimated  by  using  the  following  formulas  181 

\2=(  vf^/o  + p) 

' hi..,=(Kz ,,Wa  + P> 

where 


(5:  Weight  factor  (=1.5  in  the  present  study) 

A : Anomaly  occurrence  rate  for  the  first  year 
A,,:  Anomaly  occurrence  rate  for  the  second  year 
A",:  Average  anomaly  occurrence  rate  for  the  1st 
and  2nd  year 

A,  : Average  anomaly  occurrence  rate  for  the  l![ 
through  n*  year 


The  Annual  anomaly  numbers  versus  years  for 
MCC  and  the  top  four  anomaly  contributors 
(cracks  and  pinhole  leaks,  contamination, 
blanching  or  surface  roughness,  and  random, 
individual  anomaly)  are  illustrated  in  Figures  1 1 , 
12,  13,  14,  15  and  16. 

The  contributions  to  the  MCC  anomaly  made  by 
different  anomaly  modes  are  listed  in  Table  IV. 
The  top  three  contributors  to  the  MCC  anomalies 
versus  years  is  shown  in  Figure  17. 

Anomaly  occurrence  rate 
Estimation  for  MCC  Anomaly 
Modes 

The  anomaly  occurrence  rates  for  each  MCC 
anomaly  mode  are  estimated  by  using  the  basic 
concept  of  the  AGREE  allocation  method  P1. 

The  anomaly  occurrence  rate  to  the  MCC 
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CONTAMINATION  (CT)  ANNUAL 
ANOMALY  NUMBER 
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FIGURE  13.  Contamination  Annual  Anomaly  Number 
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FIGURE  14.  Blanching/Surface  Roughness  Annual  Anomaly 

Number 
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RANDOM  INDIVIDUAL  FAILURE  GROUP  CRT) 
ANNUAL  ANOMALY  NUMBER 


FIGURE  15.  Random  Individual  Annual  Anomaly  Number 

CONTRIBUTIONS  TO  MCC  ANOMALY  MADE  BY 
MODES  BS,  CK , CT  AND  RI 


86  *7 


W 91 


FIGURE  16.  Contributors  to  Anomaly  Number 
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TOP  THREE  MCC  INITIATOR  CONTRIBUTORS 


YEAR 

FIGURE  17.  Top  Three  Contributors  to  Annual  Anomaly  Number 


anomaly  mode  is  given  by 

X = C(-ln(R(t))/(0t 

where 

t:  mission  time 

C:  contribution  of  the  anomaly  mode 

CO'.  importance  factor  for  the  anomaly  mode 
P[MCC  anomaly  I anomaly  caused  by  the 
anomaly  mode] 

R(():  MCC  reliability  for  mission  time  t 

For  a mission  time  of  520  seconds,  the  contribu- 
tion of  different  anomaly  modes  can  be  obtained 
i from  Table  V.  The  importance  factor  for  each 
| anomaly  mode  is  1 (based  on  the  third  assump- 
| tion  described  in  the  introduction). 


Discussion  and  Summary 

The  average  annual  anomaly  number  for  MCC 
dropped  almost  80%  from  1983  to  1992.  Based 
on  the  results  obtained  from  this  study,  the  trend 
of  the  MCC  reliability  growth  is  toward  stable. 

Anomaly  Modes  Cracks  and  Pinhole  Leaks 
(CK),  Contamination  (CT).  and  Blanching  or 
Surface  Roughness  (BS) 

The  average  annual  anomaly  numbers  for  the 
anomaly  modes  cracks  and  pinhole  leaks,  con- 
tamination, and  blanching  or  surface  roughness 
dropped  88%,  73%  and  78%  respectively  (Fig- 
ures 12,  13  and  14),  but  their  contributions  to  the 
total  MCC  anomaly  are  still  in  the  important 
positions  (cracks  and  pinhole  leaks  16%,  con- 
tamination 15%,  blanching  or  surface  roughness 
19%  for  1992). 


The  estimated  anomaly  occurrence 
rates  for  MCC  anomaly  inodes  in 
y 1992  are  listed  in  the  Table  VT. 
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Table  IV.  Weighted  MCC  Annual  Anomaly  Number  (Anomalies  A'ear) 
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1 0.4000 1 

X0OOO 

2.6000 

0.8000 

0.8000 

.3.2000 

1.0000 

2.8000 

33.0000 

85 

7.5600 

io.1600! 

1.8000 

2.8400 

2.7200 

1.6000 

4.1200 

33.0000 

86 

3.0240 

7.664T 

1.9200 

1.7360 

2.8880 

0.7280 

0.7520 

0.6400 

5.2480 

24.6000' 

87 

1.2096 

7.S65C  ! 

1.9680 

0.6944 

1.1552 

0.2912 

0.3008 

0.8560 

2.0992 

88 

0.4838 

6- 1462 ' 

0.7872 

0.8778 

0.4621 

0.1165 

0.1203 

0.9424 

0.8397 

10.7760 

89 

0.7935 

3.0585 

0.9149 

0.9511 

0.1848 

0.6466 

0.0481 

0.9770 

0.3359 

7.9104 

90 

0.3174 

1.2234 

0.9660 

0.3804 

4.2739 

0.2586 

0.0193 

1.5908 

0.1343 

9,1642 

91 

1.9270 

1.0894 

3.9864 

0.1522 

1.7096 

0.1035 

4.8363 

0.0537 

13.8657 

92 

1.9708 

i.6357 . 

1.5946 

0.6609 

1.2838 

0.0414 

0.0031 

3.1345 

0.0215 
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i 
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Anomaly  Mode  Random,  Individual  Anomaly 

;<RD 

The  annual  anomaly  number  for  the  anomaly 
mode  random,  individual  anomaly  increased 
213%  from  1983  to  1992.  Since  dealing  with 
random,  individual  anomalies  is  a much  more 
difficult  task  than  other  anomaly  modes  (which 
are  more  specifically  defined),  it  is  expected  to 
see  random  individual  anomalies  take  a more 
important  position  for  the  MCC  reliability 


Anomaly  Modes:  Internal  or  External  Leak 
(LK)  and  Missing  Copper  or  Debond  (MS) 

The  MCC  anomalies  caused  by  internal  or 
external  leak  were  dominated  by  “Burst  Dia- 
phragm Leak”  from  1983  to  1986.  The  latest 
three  anomalies  (1988,  1989,  and  1992)  are  not 
related  to  “Burst  Diaphragm  Leak”.  The 
anomaly  mode  internal  or  external  leak  contrib- 
utes 6%  of  the  total  MCC  anomalies  for  1992. 


Yf:AR 

Blanching 

orSurfsce 

Roughn^s 

Cracks 

and 

pinholes 

Contamination 

Leaksdnlernai 
And  Fxternaf 

_ Pressure  or 

Structure: 

,,  temperature 

Missing  copper 

, , , either  hiub  or 

or  deband 

low 

Mechanical 
1 olerartce 

Random 

Weld 

faic'dre 

83 

17.6% 

27.5% 

11.8% 

9.8% 

3.9% 

3.9% 

9.8% 

2.0% 

13.7% 

84 

25.5% 

31.5% 

9.1% 

7.9% 

2.4% 

2.4% 

9.7% 

3.0% 

8.5% 

85 

22.9% 

30.8% 

5.5% 

8.6% 

8.2% 

1.0% 

5.7% 

4.8% 

12.5% 

86 

1 2.3% 

31.2% 

7.8% 

7.1% 

11.7% 

3.0% 

3.1% 

2.6% 

21.3% 

87 

7.4% 

47.8% 

12.0% 

4.2% 

7.0% 

1.8% 

1.8% 

5.2% 

12.8% 

88 

4.5% 

57.0% 

7.3% 

8.1% 

4.3% 

1.1% 

1.1% 

8.7% 

7.8% 

89 

10.0% 

38.7% 

11.6% 

12.0% 

2.3% 

8.2% 

0.6% 

12.4% 

4.2% 

90 

3.5% 

13.3% 

10.5% 

4.2% 

46.6% 

2.8% 

0.2% 

17.4% 

1.5% 

91 

13.9% 

7.9% 

28.8% 

1.1% 

12.3% 

0.7% 

0.1% 

34.9% 

0.4% 

92 

19.0% 

15.8% 

15.4% 

6.4% 

12.4% 

0.4% 

0.0% 

30.3% 

0.2%. 

Table  V.  Anomaly  Mode  Contributions  to  the  MCC  Anomaly  Rate 


improvement  in  the  later  MCC  There  are  7 MCC  anomalies  caused  by  anomaly 
performance  period.  mode  missing  copper  or  debond  in  1 990,  that 

drove  the  MS  annual  anomaly  number  high. 
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1 w _ Blanching'  fir  Surface r 
j Y ear  . Cracks  and  pinhdesy  • Conuiininatior*  j:;  .. 

Leaks:  Internal  ; ; Aiid  Structure  Missing 

External"  • - 'copper  or  dchond 

|1S2|  5.75E-08  4.77E-08  4.65E-08  I.93E-0S  3.75E-08 

! in  33,438  missions  l in  40.287  missions  1 in  41.328  missions  1 in  99.717  missions  1 in  51.331  missions 

Year^^f  ',f  ntel  IbtawS 

*--•  : . , ci iher  high f>r low^i  : _ ...  ... 

;*lc 

*4  -X 

ifWeld  failure  ' ^XYTAli'  '' 

If 9 92 1 L2  IE-09  8 .995- 11  9.15E-08  6.27E-1C*  2.09E-07 

1 in  1,592,479  missions  1 in  21.394,699  missions  1 in  21.024  missions  1 in  3,0(55.704  missions  • in  9,222  missions 

Table  VI.  1992  Estimated  MCC  Anomaly  Rate 


More  than  83%  of  the  missing  copper  or  debond 
anomalies  were  caused  by  “ Missing  Copper  and 
Debond”.  The  anomaly  mode  missing  copper  or 
debond  contributes  1 2%  of  the  total  MCC 
anomalies  for  1992. 

Anomaly  Modes  Pressure/Temperature  High 
or  Low  (MT),  Mechanical  Tolerance  (MU), 
and  Weld  Anomaly  (WL) 

The  contributions  to  the  MCC  anomaly  made  by 
pressure/temperature  high  or  low,  mechanical 
tolerance  and  weld  anomaly  are  relatively  low  in 
the  recent  years  (Totally  less  than  1%  of  the 
MCC  anomalies  are  contributed  by  these  three 
anomaly  modes  in  1992). 

Estimating  MLD  Initiating 
Event  Frequencies 


The  previous  analysis  demonstrated  that  events 
that  are  related  to  the  initiating  events  identified 
from  the  MLD  are  occurring  frequently  enough 
to  warrant  further,  detailed  study. 


Unfortunately,  the  detailed  records  needed  to 
study  the  thirteen  initiators  identified  by  the 
Master  Logic  Diagram  only  are  available  from 

1988  to  1992.  Previous  data  did 

y not  contain  enough  information  to 
separate  the  data  into  MLD 


initiating  event  categories.  The  annual  events 
attributed  to  each  initiating  event  is  listed  in 
Table  VII.  The  average  annual  initiating  event 
frequencies  are  based  on  two  considerations: 

The  accumulated  MCC  testing  or  flight 
time  for  each  year,  and 
The  " reliability  growth  effect ". 

The  following  formulas  have  been  used  to 
estimate  the  average  annual  initiating  event 
frequencies: 

F„  = (T,p=,  + PF2y(l  + P) 
F1,'J=(T,',F,,  + PF,)/(1  +p) 

F..r=  + PFaya  + p> 

where 

[3:  weight  factor,  1.5  in  present  study 
F.:  Number  of  events  in  i*  year 
T. .:  Time  factor,  T = T VT. 

to  1.)  J I 

The  accumulated  test/flight  time  use  for  the  T. 
values  are  shown  in  Table  VIII.  The  values  for  F 
are  calculated  based  on  the  data  in  Tables  VH 
and  vm.  The  results  are  shown  in  Table  IX. 

It  is  now  possible  to  estimate  the  individual 
initiating  event  frequencies  from  Tables  VII 
through  IX.  Using  the  AGREE  allocation 
method  the  initiating  event  frequency  forevent  i 
is  calculated  by  the  following  formula: 

X.  - (CjA,wcc)/C0i 
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X : 

C.: 

i 

^\tar 

to: 


Table  VII.  Largest  Contributors  to  MCC  Anomaly  Rate 


Initiating  event  frequency  estimate 
Contribution  of  i*  initiator 
Frequency  of  all  MCC  initiating  events 
Importance  factor  for  the  i01  initiator 
P(MCC  failure  I failure  cause  by  the  i* 
initiator) 


The  estimate  for  A.MCC  in  1992  is  1.61  x lO4  per 
second.  This  estimate  is  obtained  by  dividing 
the  total  number  of  recorded  anomalies  by  the 
total  equivalent  test  (or  hot-fire)  time.  If  it  is 
assumed  that  the  average  mission  time  is  520 
seconds  the  value  of  XMCC  implies 
an  initiator  occurs  about  every  12 
missions.  This  frequency  is  prob- 
ably too  low,  that  is  initiating 


J^SA 


events  occur  more  frequently,  but  a more  de- 
tailed data  analysis  could  improve  this  accuracy. 
However,  the  purpose  of  this  study  is  a demon- 
stration of  the  method,  it  is  not  to  calculate  the 
detailed  risk.  Therefore,  while  the  initiating 
event  frequency  is  believed  to  be  realistic  the 
current  scope  of  the  program  does  not  allow  a 
more  detailed  analysis  of  data  received  after  this 
initial  analysis  was  completed.  The  estimates  for 
the  initiating  event  frequencies  are  provided  in 
Table  XI. 

Based  on  these  results,  it  was  decided  after 
examining  the  event  tree  diagrams  (to  be  pre- 
sented in  the  following  chapter)  that  more 
detailed  analyses  of  the  data  was  warranted  for 
the  initiating  event  frequencies.  These  results 
are  presented  in  the  following  section. 
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Table  VIII.  Events  Used  to  Estimate  MLD  Initiating  Event  Frequency 


8S 

iH  i 

1 [ iru 

■ — 

r-  1 

2 

Flight/Test  seconds  ! 

45,268 

44,166 

52,407 

i 40,614 

47,475 

Time  factor 

0.9757 

1.1866 

0.7750 

1.1689 

1 .0000 

» 1 

1.5 

1.5 

1.5 

j L5 

1.5 

Table  IX.  Test/Flight  Time  Used  to  Estimate  MLD  Initiating  Event  Frequency 


Table  X.  Average  Annual  Initiating  Events  For  MLD  Initiators 
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1988 

1989 

1990 

1991 
^^1992 

Large 
Large 
1 in  82 
1 in  89 

^^r^O 

1 in  96 
1 in  73 
1 in  196 
1 in  124 
1 in  92 

Large 
Large 
1 in  124 
1 in  10$ 
1 in  201 

Large 
1 in  48 
1 in  36 
1 in  SS 
, 1 in  34 

Large 
^ in  194 
1 in  100 
1 in  97 
1 in  184 

Large 

Large 

Large 

Large 

Larae 

1 in  32 
1 in  99 
1 in  267 
1 in  600 
Hn1J34 

1988 

1989 

1990 

1991 

1992 

Large 
Large 
Large 
1 in  172 
1 in  325 

Large 

Large 

Large 

Large 

Large 

1 in  96 
1 in  298 
1 in  801 
1 in  1,799 
1 in  3,401 

1 in  96 
1 in  298 
1 in  801 
1 1 in  1S7 
1 in  297 

Large 
Large 
Large 
Large 
1 in  152 

1 in  46 
1 in  37 
1 in  70 
1 in  82 
1 in  156 

1 in  12 
1 in  12 
1 in  12 
1 in  12 
1 in  12 

Table  XI.  Estimated  Initiating  Event  Frequencies  For  MLD  Initiators 


Re-Examination  of  PRACA 
Data  Base  For  MCC  Initiating 
Frequencies 

The  MCC  data  bases  used  in  this  analysis  are 
MSFC  Report, 

SSME  FRR  Report,  and 

SSME  Historical  Data  (NASA/MFSC). 

MCC  Data  Analysis 

The  following  assumptions  were  utilized  during 
the  re-examination  of  the  available  SSME  data. 

HGW:  Hot  Gas  Wall  Crack 

The  pinholes  and  cracks  on  the  hot  gas  wall  are 
counted. 


sentative  of  the  population  as  a whole,  for 
example,  MCC  401 1 had  141  holes  and  cracks  in 
only  8 starts,  are  eliminated  from  this  analysis. 

PBF:  Bolt  failure 

Bolt  failure  such  as  bolt  stretch,  crack  or  fracture 
are  counted  (obtained  from  SSME  FRR  Report). 

ESC:  EDNi  crack  - not  aft  end 

The  cold  wall  cracks  which  are  not  at  the  aft  end 
are  counted. 

Leakage  in  weld  joint  15(EDNi  close  out) 
causing  the  MCC  liner  cavity  pressure  increase 
are  counted. 

5AE:  EDNi  crack  - aft  end 


The  sizes  and  locations  of  the  pinholes  or  cracks 
are  not  distinguished. 


MCC  cold  wall  cracks  or  debonds  at  the  aft  end 
are  counted. 


Surface  roughness,  blanching,  or 
blister  are  not  counted. 

Abnormal  data  that  are  not  repre- 


FRI:  Flow  recirculation  inhibitor  svstem  leakage 

The  FRI  failures  such  as  seal  leak  and  seal 
overheating  are  counted  (obtained  from  SSME 
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FRR  Report). 

MWF:  Manifold  weld  failure 
Manifold  weld  cracks  are  counted. 


Lack  of  fusion,  microcracks  (acceptable  per  the 
weld  spec.)  in  welds  are  not  counted  as  weld 
failures. 

CCC:  Coolant  channel  deformation/crack,  and 
CCB:  Muitiole  coolant  channel  blockage. 

Contaminations  which  do  not  cause  blockage  are 
not  counted  as  a failure. 

Based  on  the  existing  data  base,  the  MCC  fail- 
ures caused  by  CCC  or  CCB  can  not  be  explic- 
itly identified,  and  are  eliminated  in  this  analysis. 

The  MCC  failure  data  (both  test  and  flight)  from 
1/5/88  to  4/6/93  has  been  used  for  this  study. 


Note:  The  total  number  of  pinhole  s/cracks  on 
MCC  2024  were  30.  These  occurred  over  5 
years . The  number  of  the  HGW  events  for  each 
year  in  this  case  is  assumed  to  be  6. 


MCC  Initiator  Frequency  Estimation 

The  method  developed  for  the  MCC  initiator 
frequency  estimation  is  based  on  the  following 
assumptions: 

The  MCCs  considered  in  this  analysis  are  as- 
sumed to  have  same  physical  conditions.  They 
are  not  discriminated. 


The  environments  for  different  MCC  tests  such 
as  Qualification/certification  test.  Alert,  develop- 
ment test,  in  flight,  acceptance  test  and  manufac- 
turing are  not  distinguished  in  this  study. 


In  order  to  evaluate  the  MCC 
“reliability’  growth  effect”,  a 
weight  factor  has  been  used  in  this 
study.  The  basic  concept  of  using 
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this  weight  factor  is  to  place  more  weight  on  the 
current  MCC  initiating  event  than  on  the  earlier 
MCC  initiating  event  in  the  MCC  initiating  event 
frequency  estimation.  For  example,  the  MCC 
initiating  events  which  occurred  in  1992  are 
weighted  more  than  the  MCC  initiating  events 
which  occurred  in  1991.  In  this  analysis,  (5=1 
(no  reliability  growth  effect),  b =1.5.  and  (5=2 
(strong  reliability  growth  effect)  has  been  tested. 

Since  die  successful  test/flight  data  between  the 
MCC  anomalies  are  not  available  at  the  present 
time,  the  frequencies  of  MCC  initiators  are 
estimated  on  a yearly  basis  (the  accumulated 
MCC  test/flight  time  for  each  year  are  available). 

The  MCC  initiating  event  frequencies  are  as- 
sumed to  be  proportional  to  the  length  of  the 
accumulated  MCC  test/flight  time. 

For  this  set  of  assumptions  the  analysis  of  the 
previous  section  is  repeated.  Using  a standard 
distribution  uncertainty  ranges  can  be  formed. 

These  resuits  are  presented  in  Tables  IV-A,  V-A, 
and  VI-A. 
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Table  V-A.  Individual  Anomaly  Mode  Contributions  to  the  MCC  Anomaly  Rate 
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Table  VI-A.  1992  Estimated  MCC  Anomaly  Rate 


Functional  Event 
Sequence  Diagrams 
(FESD's): 

; Application  To  The  MCC 

Manifold  Weld  Anomaly  Functional  Event 
Sequence  Diagram 

The  functional  event  sequence  diagram  for  the 
Manifold  Weld  Anomaly,  identified  MWF  in  the 
FESD,  is  shown  in  Figure  IS.  The  entry  condi- 
tion for  this  FESD  is  a crack  of  any  size  existing 
in  the  weld  material  of  Heat  Affected  Zone 
(HAZ)  of  the  parent  material.  Given  that  a crack 
exists,  the  first  question  to  ask  is  if  it  is  large 
enough  to  be  detected,  MWF-CD-001?  If  it  is 
large  enough  to  be  detected  then  it  is  assumed 


iManifold  Weld  Failure 

! Functional  Event  Sequence  Diagram 


Figure  18.  Manifold  Weld  Anomaly  FESD 
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that  a repair  of  the  crack  is  attempted.  If  the 
repair  is  effective,  a positive  response  to  event  ' 
MWF- RE -00 1,  then  there  is  a successful  opera- 
tion. If  the  repair  is  not  effective  then  a crack 
still  exists  in  the  structure  and  the  FESD  must 
return  to  the  path  that  examines  the  size  of  the  j 
crack.  If  the  crack  in  the  weld  or  HAZ  is  small 
enough  to  not  grow  to  a critical  size  over  one 
mission,  a positive  response  to  MWF-LC-001, 
then  there  is  successful  operation.  If  the  crack  1 
grows  to  a critical  size  then  there  would  be  a loss 
of  vehicle. 

Bolt  Anomaly  Functional  Event  Sequence 
Diagram 

The  functional  event  sequence  diagram  for  the 
Bolt  Anomaly,  identified  PBF  in  the  FESD,  is 
shown  in  Figure  19.  The  entry  point  for  this 
FESD  is  that  the  pre-load  on  the  bolt  is  outside 
the  MSFC  specifications.  The  evidence  from 

Bolt  Fai lure 

Functional  Event  Sequence  Diagram 


Figure  19.  Bolt  Anomaly  FESD 


MSFC  is  that  a single  bolt  failure  due  to  incor- 
rect torque  being  applied,  bolt  stretch,  or  bolt 
shear,  is  insufficient  to  affect  the  operation  of  the 
MCC.  Therefore,  the  first  question  to  be  asked 
is  if  more  than  a single  bolt  has  failed,  PBF-BB- 
1001 . If  more  than  a single  boh  has  failed,  it  is 


Bolt  Failure 

Functional  Event  Sequence  Diagram 


Figure  20.  Updated  Bolt  Anomaly  FESD 
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Coolant  Channel  Blockage 

Functional  Event  Sequence  CMagrem 


I 

I 


Figure  21.  Coolant  Channel  Blockage  FESD 


single  bolt.  The  events  PBF-CI-001  and  PBF- 
SW-001  have  been  deleted  from  the  PBF  FESD. 
This  updated  FESD  is  shown  in  Figure  20. 


assumed  that  the  leakage  and/or  vibration  load- 
ing will  lead  to  an  LOV  event.  If  only  a single 
bolt  fails  then  a total  loss  of  the  bolt  could  also 
lead  to  a leakage  path  causing  LOV.  If  there  is 
not  a total  loss  of  the  bolt,  a negative  response  to 
PBF-PL-001,  then  it  is  possible  that  the  added 
loading,  both  mechanical  and  thermal,  on  adja- 
cent bolts  could  cause  their  failure.  The  initial 
FESD  for  the  bolt  loss  included  a branch  for  the 
combustion  or  flow  instability;  however,  MSFC 
personnel  stated  in  an  August  24, 1993  meeting 
that  this  is  not  possible  for  a partial  loss  of  a 


Coolant  Channel  Blockage  Functional  Event 
Sequence  Diagram 

The  functional  event  sequence  diagram  for  the 
Coolant  Channel  Blockage  Anomaly,  identified 
CCB  in  the  FESD,  is  shown  in  Figure  21 . The 
entry  condition  for  this  FESD  is  loss  of  flow  in 
one  or  more  channels. 

The  first  event  is  whether  enough  blockage 
occurs  to  starve  the  LPFTP  or  change  the  mix- 
ture ratio  enough  to  cause  the  controller  to 
change  the  oxidizer  valve  position,  event  CCB- 
CD-001 . If  the  LPFTP  receives  insufficient  flow 
then  the  engine  must  be  shutdown  or  there  will 
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be  an  LOV  event,  CCB-SD-001.  If  the  blockage 
is  insufficient  to  cause  the  LPFTPto  fail  then  the 
next  event  to  check  is  whether  the  blockage 
causes  a crack  in  the  liner,  identified  as  CCB- 
DC-001.  If  no  crack  is  caused  then  the  flow  path 
has  been  changed  but  there  is  no  significant 
effect  on  the  MCC  operation  and  there  is  suc- 
cessful operation.  If  the  blockage  does  cause  a 
crack,  then  the  question,  CCB-HG-001,  is  if 
there  is  a crack  on  the  hot  gas  wall.  If  there  is  a 
crack,  a positive  response  to  CCB-HG-001,  then 
the  sequence  must  either  transfer  to  the  coolant 


_ . Ari  Smotjvao-Ownta  Campnny 

Science  Applications  International  Corporation 


The  functional  event  sequence  diagram  for  the 
Coolant  Channel  Cracking  Anomaly,  identified 
CCC  in  the  FESD,  is  shown  in  Figure  22.  The 
entry  point  for  this  FESD  is  a crack  of  any  size 
within  the  land  area  of  the  Narioy  liner.  This  is 
an  important  definition  for  the  remainder  of  the 
FESD  discussion.  Cracks  on  the  hot  gas  wall  of 
the  Narioy  or  in  the  Narloy-copper-nickel  inter- 
face and  nickel  closeout  are  treated  separately  in 

Coolant  Channel  Cracks 

Functional  Event  Seqenee  Diagram 


Figure  22.  Coolant  Channel  Cracks  FESD 


channel  crack,  the  EDNi  closeout  crack,  or  the 
hot  gas  wall  crack  FESD. 

Coolant  Channel  Cracks  Functional  Event 
Sequence  Diagram 
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this  study.  Thus,  if  the  coolant  channel  crack 
occurs  and  it  grows  all  the  way  through  the  land 
the  net  effect  is  to  have  turned  two  coolant 
channels  into  one  coolant  channei  since  a flow 
path  between  the  channels  has  now  been  created. 
Because  the  fuel  is  undergoing  a transition  from 
a liquid  to  gaseous  phase  there  is  the  potential 
for  a mass  flow  rate  reduction  due  to  the  com- 
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pressible  nature  of  the  fluid.  Therefore,  the  heat 
transfer  characteristics  of  this  type  of  anomaly 
must  be  quantified  through  a separate  FESD. 


The  first  event  that  is  examined  is  whether  there 
are  multiple  cracks  in  the  coolant  channel  land 
CCC-MC-001.  If  there  are  then  the  liner 
strength  is  examined  to  determine  if  it  has  fallen 
below  the  load  level,  CCC-LS-001.  If  it  has  then 
there  is  a loss  of  cooling  to  the  MCC.  MCC 
failure,  and  LOV.  If  the  strength  is  not  less  than 
the  load  level  then  the  crack  growth  is  examined 
for  stability.  If  the  crack  growth  is  dynamic  it  is 
possible  to  change  the  liner  geometry  due  to 
bulging  and  cause  a combustion  and/or  flow 
instability,  this  branch  identified  as  CCC-CI-001. 
Such  an  instability  could  cause  a shock  wave 
i that  would  damage  the  nozzle  and  cause  an 

LOV.  If  a shock  wave  or  flow  instability  is  not 
caused  then  the  effect  of  a dynamic  crack  on  the 
overall  liner  strength  must  be  examined.  If  the 
ripping  of  the  multiple  channel  lands  reduces  the 
strength  below  the  load  level  then  there  is  a loss 
I oftheMCCandLOV. 


Note  that  several  branches  of  the  CCC  FESD 
I converge  to  the  point  CCC-LS-001.  This  is 
because  the  phenomenological  sequence  after  a 
no  response  to  CCC-TS-001,  CCC-CI-001,  and 
CCC-SW-001  are  all  identical.  If  the  finer 
strength  remains  above  the  load  level  then  the 
effect  of  the  dynamic  crack  on  the  liner-to-jacket 
weld  must  be  examined.  If  the  impact  of  the 
dynamic  crack  on  this  weld  causes  weld  failure 
there  will  be  leakage  into  the  liner/jacket  cavity. 
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This  leakage  will  transfer  to  a point  in  the  EDNi 
FESD,  just  prior  to  ESC-BD-00 1 . If  the  liner-to- 
jacket  weld  does  not  fail  then  there  is  no  adverse  ' 
effect  of  multiple  coolant  channel  cracks  on  the 
MCC  and  there  is  successful  operation. 

If  there  are  not  multiple  channel  cracks,  a no 
response  to  CCC-MC-001,  then  the  next  event 
examined  is  if  the  coolant  channel  crack  trans- 
fers load  to  the  hot  gas  wail  causing  a hot  gas 
wall  crack,  CCC-HG-00 1 . A yes  response  to 
this  event  causes  a transfer  into  the  HGW  FESD. 

If  not  then  the  same  question  is  posed  for  the 
closeout  wall,  CCC-CW-OOL  Again  a positive 
response  causes  a transfer  to  the  ESC  FESD.  A 
no  response  implies  successful  operation. 

Flow  Recirculation  Inhibitor  Functional  Event 
Sequence  Diagram 

The  functional  event  sequence  diagram  for  the 
Flow  Recirculation  Inhibitor  Anomaly,  identified 
FRI  in  the  FESD,  is  shown  in  Figure  23.  It  must 
be  noted  that  all  seal  leakage  events  have  been 
collapsed  into  this  FESD.  The  other  seal  leakage 
locations  are:  pressure  port  seal;  contracting  seal 
at  the  MCC  and  powerhead  interface;  and  the 
seal  at  the  MCC  and  injector  piate  interface.  The 
pressure  port  seal  leads  to  events  that  are  outside 
the  scope  of  the  MCC  restrictions  placed  on  this 
study.  As  discussed  with  MSFC  staff  at  the 
August  23  and  24,  1993  meetings  the  powerhead 
seals  are  not  actual  seals.  The  contracting  seal  is 
not  meant  to  contain  gas  but  rather  to  provide  a 
space  for  the  contraction  and  expansion  during 
cool-down  and  engine  firing.  The  inter-propel- 
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Flow  Recirculation  Inhibitor  System 

Functional  Event  Sequence  Diagram 


lant  face  seal  already  has  a leak  path  provided  by 
the  holes  drilled  in  the  piate.  Therefore,  any 
anomaly  causing  leakage  will  only  act  to  cool 
the  hot  gas  wall  and  will  actually  be  beneficial. 
Thus,  the  only  seal  of  concern  is  the  G15  bel- 
lows seal.  Since  the  FRI  must  fail  before  any 
G15  anomaly  would  have  any  effect  on  the 
MCC  operation  the  FRI  system  leads  to  all  seal 
| leakage  problems. 

Given  that  the  FRI  has  failed  the  first  event  to 
[consider  is  whether  the  hot  gas  bypasses  the  G 15 


seal  or  whether  it  recirculates  in  the  gap  between 
the  MCC  and  nozzle,  event  FRI-BY-OOI.  If  it 
does  bypass  the  G 15  seal  then  the  engine  is 
operating  as  designed  and  this  is  successful 
operation.  If  the  gas  does  not  bypass  the  G15 
seal  then  the  sequence  may  proceed  by  failing 
the  GI5  seal  and  allowing  gas  to  escape,  a 
positive  response  to  FRI-GE-001,  or  the  seal 
may  contain  the  gas  within  the  engine.  Whether 
the  gas  is  contained  or  not  the  next  three  events 
are  identical  in  concept  but  their  probability  of 
occurrence  is  different.  For  example,  if  the  gas 
is  escaping  the  engine  the  force  and  temperature 
change  on  the  manifold  may  cause  its  failure, 
event  FRI-MF-001.  If  the  gas  is  contained 
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Hot  Gas  Wall  Cracks 

Functional  Evenet  Sequence  Diagram 


Figure  24.  Hot  Gas  Wall  Cracks  FESD 


within  the  engine  then  the  manifold  may  still  fail 
because  of  the  change  in  the  thermal  stress  from 
the  FRI  failure  but  the  hot  gas  will  not  be  in 
direct  contact  causing  a lower  probability  of 
occurrence. 

Hot  Gas  Wall  Cracks  Functional  Event  Se- 
quence Diagram 

The  functional  event  sequence  diagram  for  the 
Hot  Gas  Wall  Cracks  Anomaly,  identified  HGW 
in  the  FESD,  is  shown  in  Figure  24.  The  entry 
point  for  this  FESD  is  any  crack  on  the  hot  gas 


wall  surface  of  the  MCC.  The  first  event,  HGW- 
TW-001,  is  when  the  crack  becomes  a through- 
wall  crack.  If  the  crack  is  not  a through-wall 
crack  then  there  is  successful  operation.  When 
the  crack  becomes  a through-wall  crack,  it  can 
undergo  stable  or  unstable  crack  growth,  repre- 
sented by  event  HGW-CG-001.  In  the  situation 
in  which  the  crack  growth  is  unstable  or  dy- 
namic. a similar  set  of  event  sequences  as  in  the 
coolant  channel  blockage  and  crack  FESD’s  is 
considered.  In  this  sequence  the  possibility  of  a 
combustion  or  flow  instability,  HGW-CI-001,  is 
examined  which  if  it  does  not  occur  then  the 
possibility  of  the  liner  strength  being  reduced 
below  the  load  level.  HGW-LS-001,  is  consid- 
ered. If  the  HGW-LS-001  event  does  occur  then 
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Figure  25.  EDNi  Closeout  Separation/Crack  FESD 

does  not  stop  the  crack  growth  then  the  event 
crack  growth  rate  to  critical  size  is  less  than 
mission  time  is  examined,  event  HGW-GR-001. 
If  this  event  occurs,  there  is  successful  operation, 
otherwise  there  is  a loss  of  the  MCC  and  subse- 
quent LOV. 


there  is  a loss  of  the  MCC  and  LOV.  If  not,  then 
the  effect  of  an  unstable  crack  being  stopped  at 
the  liner  to  jacket  weld  is  examined  by  event 
HG W-LW-00 1 . If  the  weld  faiis,  then  there  is  a 
transfer  into  the  EDNi  closeout  separation  or 
crack  FESD  via  transfer  HGW-ESC-001 . If 
either  the  shock  wave  does  not  damage  the 
nozzle  or  the  liner  to  jacket  weld  does  not  fail, 
negative  outputs  from  either  event  HGW-SW- 
001  or  HGW-LW-001,  then  there  is  successful 
operation. 

If  the  crack  is  growing  stably,  a yes  to  event 
HGW-CG-001 , then  the  next  event  is  fuel  leaks 
through  the  crack  to  relieve  the  thermal  strain 
and  stop  the  crack  growth,  HGW-CS-001.  If  this 
occurs  then  there  is  successful  operation.  If  it 

/ 


EDNi  Closeout  Separation/Crack  Functional 
Event  Sequence  Diagram 

The  functional  event  sequence  diagram  for  the 
EDNi  Closeout  Separation/Crack  Anomaly, 
identified  ESC  in  the  FESD,  is  shown  in  Figure 
25.  The  entry  point  for  this  FESD  is  any  debond 
of  the  Narloy-copper-nickel  interface,  a crack 
within  the  shear  zone  of  the  interface  in  the 
Narloy,  or  a crack  in  the  nickel  material.  The 
FESD  for  this  event  has  been  modified  by  the 
MSFC  engineers  to  account  for  the  fact  that 
there  can  be  a different  path  if  the  debond  occurs  i 
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in  the  aft  end  of  the  liner.  These  changes  are 
reflected  « 

i 

Thus,  the  first  event  is  the  debond  does  not  occur 
in  the  aft  end  of  the  liner,  event  ESC-AE-001 . If 
this  is  fails,  then  the  sequence  of  events  is  to 
check  if  the  fuel  jet  does  not  cause  a bum 
through  of  the  nozzle,  ESC-BN-00 1 . If  it  does 
cause  a bum  through,  a negative  response  to 
ESC-BN-00 1 , then  there  is  a loss  of  vehicle 
event.  If  no  nozzle  bum  through  occurs  then  the 
next  event  is  the  after  EDNi  leak  does  not  fail 
the  G15  bellows  seal,  ESC-FB-001.  If  it  does,  a 
no  branch  to  this  event,  then  a transfer  into  the 
ESC  FESD  is  made.  If  it  does  not  fail  the  bel- 
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lows  then  the  FRI  system  integrity  is  checked  via 
event  ESC-FF-001 . This  event  is  FRI  system 
does  not  fail,  which  if  true  requires  that  the  leak 
rate  into  the  aft  compartment  be  checked.  If  the 
leak,  rate  does  not  pose  a fire/explosion  hazard,  a 
positive  output  from  event  then  there  is  success- 
ful operation.  j 

Because  of  changes  to  the  EDNi  closeout  FESD 
there  are  also  changes  that  must  reflect  the  new 
transfer  points  in  the  FRI  FESD.  These  changes 
are  shown  in  Figure  26. 


EDNi  Closeout  Separation  or  Crack 

Functional  Event  Sequence  Diagram 
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Figure  26.  Updated  EDNi  Closeout  Separation/Crack  FESD 
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Summary- 

The  FESD's  that  have  been  constructed  by 
applying  a structure  to  the  logical  sequence  of 
events  can  now  be  cast  into  a form  that  is  ame- 
nable to  computer  analysis.  This  form  is  the 
event  tree  format  mentioned  previously.  Based 
on  the  FESD’s  just  developed,  the  event  trees  can 
be  constructed  in  a relatively  easy  manner.  In 
order  to  quantify  the  event  trees  we  first  need  to 
perform  some  data  analysis  to  define  the  fre- 
quency with  which  different  events  occur  or 
states  exist.  The  following  section  gives  the  data 
quantification  for  the  initiating  events. 
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Event  Tree  Analysis: 

Application  To  The  MCC 

Event  Tree  Event  Frequency  Evaluations 

The  FESD  diagrams  have  been  converted  to  an 
event  tree  format.  Quantitatively,  there  is  no 
essential  difference  between  the  event  tree  and 
the  corresponding  FESD.  Qualitatively,  the 
format  is  significantly  different,  and 
computationally,  there  are  several  computer 
programs  which  allow  for  easy  calculation  of  the 
top  event  frequency  given  the  pivotal  event 
frequency. 

The  next  step  in  the  analysis  must  be  the  assign- 
ment of  event  tree  probabilities  to  each  pivotal 
event.  In  the  cases  where  data  exist  to  calculate 
these  frequencies,  reasonable  estimates  can  be 
made.  Unfortunately,  there  is  very  little  data 
available  to  estimate  the  frequency  of  most 
pivotal  events.  This  implies  that  expert  opinion 
must  be  employed.  In  those  cases  in  which 
expert  opinion  is  used,  the  estimates  are  meant  to 
be  conservative. 

The  pivotal  event  frequencies  for  each  event  tree 
are  given  in  Table  XII.  The  frequencies  are 
based  on  previous  meetings  with  SAIC  and 
MSFC  engineers  as  well  as  data  from  the 
PRACA  data  base.  Some  general  comments 
about  each  of  the  event  trees  are  made  in  the 
following  sections. 
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Table  XII.  Pivotal  Event  Frequencies 


Coolant  Channel  Blockage  Pivotal  Event 
Frequencies 


The  pivotal  events  listed  in  Table  XI  for  the 
Coolant  Channel  Blockage  event  tree  are  listed 
under  the  nomenclature  CCB.  It  is  assumed  that 
the  blockage  of  the  coolant  channel  will  have  a 
negligible  effect  on  the  mixture  ratio  99%  of  the 
time.  However,  if  there  is  coolant  channel 
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blockage,  the  probability  of  high  thermal  stresses 
inducing  cracks  is  significant.  Therefore,  the 
event  "no  cracks"  is  assumed  to  occur  only  5% 
of  the  time.  Because  we  are  interested  only  in 
the  those  cracks  that  are  initiated  by  CCB,  the 
hot  gas  wall  (HGW)  and  coolant  channel  (CC) 
wall  cracks  are  assumed  to  occur  only  10%  of 
the  time  in  which  there  is  blockage.  It  is  impor- 
tant to  point  out  this  is  not  representative  of  the 
hot  gas  wall  cracking  frequency  but  rather  is 
caused  by  the  event  of  channel  blockage  either 
from  deformation  or  contamination.  A conserva- 
tive frequency  estimate  in  which  the  LPFTP  is 
affected  due  to  reduced  H2flow  of  10%  is  used. 

In  reality  it  is  expected  that  the  amount  of  block- 
age of  the  coolant  channel  will  be  low  enough 
that  there  will  be  no  effect  on  the  LPFTP  with  a 
much  higher  frequency,  say  99.9%  of  the  time. 
Finally,  in  a consistent  manner  throughout  the 
entire  quantification,  the  effect  of  engine  shut- 
down is  not  accounted  for  in  this  study.  This 
implies  that  the  frequency  of  the  loss  of  the 
MCC , as  opposed  to  the  frequency  of  the  loss  of 
vehicle , is  being  examined.  The  event  tree  is 
shown  in  Figure  27. 

Coolant  Channel  Cracking  Pivotal  Event 
Frequencies 

The  coolant  channel  cracking  event  frequencies 
are  similar  to  the  CCB  frequencies,  with  the 
exception  of  considerating  stable  and  unstable 
crack  growth.  All  events  associated  with  the 
stable  growth  of  cracks  are  assigned  a 99% 
frequency  of  occurrence.  That  is,  1 in  100 
cracks  will  grow  unstably,  will  have  a stable 
crack  growth  time  less  than  the  mission  time, 
and  so  forth.  Each  of  these  pivotal  events  is 
listed  as  CCC.  The  event  tree  is  shown  in  Figure 
28. 
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An  anomaly  of  the  manifold  weld  is  relatively 
straightfoward.  Either  a crack  exists  or  it  does 
not.  If  it  exists  and  is  large  then  it  can  cause  an 
anomaly.  Of  course,  if  it  is  large  then  it  is  also 
more  easily  detectable.  Thus,  the  events  MWF- 
CD-001  and  MWF-LC-001  are  not  independent. 
If  it  is  assumed  that  the  crack  in  the  manifold 
w'eld  area  has  a small  chance  of  being  detected, 
then  there  is  a corresponding  increase  in  the 
likelihood  that  the  crack  is  small.  If  a crack  is 
detected,  it  is  assumed  that  a repair  is  always 
attempted.  However,  it  is  further  assumed  that 
this  repair  is  effective  only  90%  of  the  time. 

This  repair  rate  is  conservative  and  attempts  to 
encompass  the  probability  of  introducing  a flaw 
as  well  as  an  incomplete  repair.  The  event  tree  is 
shown  in  Figure  29. 


Recent  evidence  from  MSFC  tests  has  indicted 
that  the  single  bolt  anomaly  sequence  is  unlikely 
to  a cause  significant  likelihood  of  catastrophic 
engine  failure.  Therefore,  pivotal  events  in  this 
tree  are  assumed  to  be  relatively  high  reliability 
occurring  only  1 in  1,000  times.  The  event  tree 
is  shown  in  Figure  30. 


The  hot  gas  wall  pivotal  events  have  led  to  many 
discussions  between  MSFC  and  SAIC  staff 
about  what  does  and  does  not  constitute  a cred- 
ible event.  Therefore,  at  this  time  some  discus- 
sion is  warranted  regarding  test  and  flight  histo- 
ries and  their  relevance  to  risk  analysis. 

In  many  of  the  developmental  and  flight  MCC’s 
there  have  been  many  instances  of  cracking. 
These  cracks  have  reached  in  size  from  "pin- 


Manifold  Weld  Anomaly  Pivotal  Event 
Frequencies 


Bolt  Anomaly  Pivotal  Event  Frequencies 


Hot  Gas  Wall  Pivotal  Event  Frequencies 
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hoie"  cracks  to  cracks  eight  inches  long.  In 
ever)'  case  to  date,  a crack  has  never  been  ob- 
served to  grow  beyond  the  MCC  throat  area. 
Because  this  has  never  been  observed,  the 
occurrence  of  a crack  which  extends  beyond  the 
throat  is  viewed  as  an  incredible  event  by  the 
MSFC  staff.  However,  if  the  load  needed  to 
drive  the  crack  through  the  throat  area  only 
occurs,  on  the  average,  once  in  even-  one  hun- 
dred missions  then  there  is  a high  probability 
that  this  event  simply  has  not  been  observed  yet. 

To  demonstrate  this,  let  us  make  some  conserva- 
tive assumptions.  First,  assume  that  the  entire 
MCC  test  and  flight  history'  is  equivalent  to  500 
missions.  Second,  assume  that  in  one  half  of 
these  missions  there  is  a crack  is  near  the  throat 
area.  Third,  assume  that  all  of  the  missions  have 
the  same  statistical  load  spectrum.  Finally, 
assume  that  the  load  necessary  to  drive  the  crack 
through  the  throat  area  occurs  at  a probability  of 
1 %.  In  this  case  the  probability  of  the  crack  not 
extending  beyond  the  throat  area  is  91.8%. 

There  is  still  a 8.2%  probability  that  the  event 
simply  has  not  been  observed!  If  only  in  one 
fourth  of  the  missions  is  the  MCC  cracked  then 
there  is  a 28.5%  probability  that  the  event  will 
not  have  been  observed.  While  a substantial 
number  of  MCC’s  have  been  cracked,  this  is  still 
less  than  a one-in-four  mission  probability. 

Examined  another  way,  if  the  frequency  of 
cracked  MCC’s  is  less  than  one  in  seven  mis- 
sions, then  there  is  at  least  a 50%  probability 
that  a crack  growing  beyond  the  throat  area 
simply  has  not  been  observed.  Of  course,  the 
data  can  also  be  used  to  help  determine  what  the 
load  level  probability  to  grow  a crack  beyond  the 
throat  area. 

For  example,  assume  that  the  probability  of  the 
load  level  needed  to  drive  a crack  beyond  the 
throat  area  is  10%.  With  all  other  assumptions 
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being  the  same,  the  probability  that  a crack 
growing  through  the  throat  area  simply  not 
having  been  observed  is  3.6  x iO-12  If  the 
incidence  of  MCC  cracking  is  one  in  four  mis- 
sions, this  probability  is  still  1.9  x ) Ou.  There- 
fore, it  is  safe  to  assume  that  the  frequency  of 
this  load  is  substantially  less  than  10%. 

This  is  the  logic  used  to  establish  the  probabili- 
ties for  the  HGWr  crack  event  tree.  The  event 
tree  is  shown  in  Figure  3 1 . 

Flow  Recirculation  Inhibitor  (FRI)  System 
Pivotal  Event  Frequencies 

The  failure  of  the  FRI  system  will  not  necessar- 
ily ensure  that  gas  will  recirculate  in  the  MCC 
and  nozzle  interface.  For  this  study,  it  is  as- 
sumed that  this  occurs  10%  of  the  time.  Since 
the  FRI  has  failed,  there  is  a high  probability  that 
the  exhaust  gas  will  leave  the  normal  gas  stream, 
i.e.  the  gas  will  not  recirculate  into  the  normal 
exhaust.  However,  based  on  MSFC  expertise,  it 
is  assumed  that,  for  99.99%  of  the  time,  the 
manifold,  bolts,  and  coolant  channel  at  the 
turnaround  weld  are  not  induced  to  fail  by  this 
gas  path.  The  event  tree  is  shown  in  Figure  32. 

EDNi  Closeout  Separation/Crack 

The  EDNi  closeout  is  divided  into  two  event 
trees.  The  first,  EAE,  is  for  the  case  when  the 
closeout  fails  in  the  aft  end.  This  was  a concern 
raised  by  MSFC  structural  engineers.  The 
second  event  tree,  ESC,  follows  the  events  more 
ciosely  associated  with  the  previous  FMEA.  The 
event  trees  are  shown  in  Figure  33  and  Figure 
34. 
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Figure  28.  Coolant  Channel  Cracking  Event  Tree 
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Figure  29.  Manifold  Weld  Anomaly  Event  Tree 
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Figure  31.  Hot  Gas  Wall  Crack  Event  Tree 
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Figure  32.  Flow  Recirculation  System  Event  Tree 
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Figure  33.  EDNi  Crack:  Aft  End  Event  Tree 
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Figure  34.  EDNi  Crack:  Not  in  Aft  End  Event  Tree 


Event  Tree  Quantification 

The  data  in  Table  XU  was  put  into  the  event 
trees  given  in  Figures  27  through  34.  The 
calculations  were  made  using  Microsoft  Excel®. 
The  results  are  shown  in  Table  XIII,  and  they  are 
graphically  depicted  in  Figure  35.  The  end  result 
is  that  the  loss  of  the  MCC  is  estimated  at  ap- 
proximately a 1 in  1,500  chance  of  occurrence 
permission. 
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Introduction 


The  uncertainty  analyses  of  the  MCC  event  trees 
and  risk  models  requires  that  the  frequency  of 
each  pivotal  event  be  represented  by  a distribu- 
tion. These  distributions  were  developed,  to  the 
extent  possible,  based  on  data  obtained  from 
MSFC.  Primarily,  these  data  were  based  on  the 
PRACA  database.  The  assumptions  and  results 
of  these  analyses  are  contained  in  the  chapter  on 
initiating  event  frequencies.  This  chapter  recalls 
the  results  of  that  data  analysis  and  provides  the 
output  of  an  uncertainty  analysis  that  was  per- 
formed for  the  risk  significant  event  trees. 

Input  Distributions 


The  event  trees  discussed  in  the  previous  sec- 
tions were  evaluated  using  a probabilistic  meth- 
odology for  uncertainty  analysis.  The  distribu- 
tion fitting  for  the  data  was  determined  to  be  not 
critical.  Thus,  if  a lognormal  or  Weibull  distri- 
bution is  selected  for  use  in  the  analyses,  the 
effect  of  the  selected  distribution  on  the  uncer- 
tainty results  is  minimal.  The  selected  distribu- 
tions are  then  one  of  three  types: 

Uniform.  These  are  used  for  the  values  of 
constants.  For  example,  the  engine  shutdown  is 
assumed  to  never  occur,  i.e.  no  credit  is  given  for 
controller  logic  since  it  is  outside  the  scope  of 
the  MCC  and  thus  this  study.  Since  it  occurs 
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ns 

Normal 
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Ijn  176, 
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*««»  i r id  1 in  1.101  1 if)  156 

PSE-PU-OOt  Normal  99.00%  99.33%  99.29% 


Table  XIV.  Uncertainty  Analysis  Inputs  For 
MCC  Event  Trees 

with  0%  probability  it  is  assigned  a uniform 
distribution  with  both  the  lower  and  upper  limits 
set  to  0,  i.e.  a constant. 

Normal.  This  is  the  standard  normal  density 
or  bell  shaped  curve. 

Weibull  distribution.  This  is  used  to  approxi- 
mate data  that  exhibits  "long  tails":  that  is,  there 
is  a significant  probability  of  the  pivotal  event 
occurring  with  high  frequency.  It  is  important  to 
re-emphasize  at  this  point  that  there  are  two 
numbers  of  interest  during  an  uncertainty  evalua- 
tion: the  frequency  of  an  event  occurring  and  the 
probability  that  the  frequency  selected  is  the 
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'true " frequency.  For  example,  from  Table  XIII 
the  mean,  or  average,  value  of  the  initiating  event 
frequency  fortheFRI  is  1 in  191  missions. 
However,  there  is  a wide  spread  in  the  data  and. 
therefore,  while  we  believe  this  to  be  an  average 
value,  we  aiso  believe  that  the  value  could  be 
between  I in  109  and  1 in  772  missions.  This 
uncertainty  in  our  degree  of  knowledge  of  the 
true  FRI  initiating  event  frequency  is  represented 
by  the  probability  density  function,  in  this  case  a 
Weibull  distribution. 

Table  XIV  gives  the  results  of  ail  of  the  distribu- 
tion fits  used  in  the  uncertainty  analyses. 

Event  Tree  Uncertainty 
Analyses  Results 

V 

The  distributions  shown  in  Table  XIV  were  input 
to  the  uncertainty  analysis  code  for  evaluation. 
The  result  of  the  complete  uncertainty  analysis  is 
given  in  Figure  36.  In  this  Figure  we  see  that  the 


Ar'  Gmoicyot-Owoed  Correany 

Science  Applications  international  Corporation 


estimated  loss  of  MCC  frequency  is  between  1 in 
3,000  missions  and  1 in  800  missions.  The  50% 
value  (which  is  not  the  mean  value)  is  near  1 in 
1,500  missions.  This  does  compare  very'  favor- 
ably to  the  point  estimate,  indicating  that  the 
distributions  are  not  causing  a significant  skew- 
ing effect  and  that  many  are  contributing  equally 
to  the  overall  uncertainty.  This  is  best  seen  by 
an  examination  of  the  individual  event  tree 
uncertainty  analyses. 

Figure  37  shows  the  results  of  the  individual 
event  tree  uncertainty  analyses.  In  this  Figure 
the  overall  uncertainty  analyses,  shown  in  Figure 
36,  are  also  superimposed.  The  individual  event 
tree  uncertainty  analyses  indicate  that  the  mani- 
fold weld  anomaly,  aft  end  debond  of  the  liner, 
and  the  bolt  anomaly  make  up  a significant 
portion  of  the  uncertainty.  The  most  effective 
way  to  reduce  the  MCC  risk  is  better  inspections 
or  repairs  of  the  manifold  weld. 


MCC  PRA  Uncertainty  Analyses 


10.000 


Figure  36.  MCC  Event  Tree  Uncertainty  Analyses 
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DATA  BASE  FOR  ANOMALY  AND  FAULTS  USED  TO 
DEVELOP  INITIATING  EVENT  AND  EVENT 
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CONTINUATION  OF  SPACE  Suui  J uE  PROBABILISTIC  RISK  ASSESSMENT,  TASK  1 • 

Technical  Report: 

An  Investigation  of  the  Risk  Implications  of  Space  Shuttle 
Solid  Rocket  Booster  Chamber  Pressure  Excursions 


1.0.  Introduction  and  Background. 

This  document  is  a technical  report  on  work  by  the  Advanced  Technology  Division  of  Science 
Applications  International  Corporation  (SAIC),  New  Yoik,  NY  and  SAlC's  subcontractor  Safety 
Factor  Associates,  Encinitas,  CA  to  support  an  investigation  of  the  risk  implications  of  pressure 
excursions  observed  on  Space  Shuttle  Solid  Rocket  Boosters.  The  SRB  Pressure  Excursion 
Assessment  described  herein  is  T»sV  1 of  the  continuation  of  the  Space  Shuttle  Probabilistic  Risk 
Assessment  (PRA)  program  sponsored  by  the  Headquarters  Office  of  Space  Flight  (Code  M)  of  the 
US  National  Aeronautics  and  Space  Administration. 


1.1.  Bickground. 

Post-flight  analysis  of  the  telemetry  data  on  solid  rocket  booster  internal  pressure  from  Space 
Shuttle  Mission  STS-54  in  January  1993  revealed  an  apparent  pressure  excursion  of  approximately 
13  psi*  peak  magnitude  above  nominal  pressure  and  four  seconds  total  duration  on  the  "Btt 
booster,  beginning  at  67  seconds  after  SRB  ignition.  While  slight  pressure  variations  are  a normal 
feature  of  the  soUd-fuei-rocket  bum  process,  pressure  excursions  in  solid-fuel  rocket  motors 
translate  to  thrust  excursions,  and  therefore  can  impose  a variety  of  Hazards  on  the  Shuttle  vehicle 
if  they  exceed  a safe  magnitude.  Since  the  pressure  transients  appeared  to  be  increasing 
flight-to-flight  in  size,  frequency,  and  variability,  NASA  became  concerned  about  their  potential 
flight  safety  implications,  and  initiated  a series  of  investigations  of  their  cause(s)  and  effects  on  the 
Shuttle. 

Analysis  of  chamber  pressure  data  from  previous  firings  of  the  High-Performance  Motor  (HPM) 
SRB  and  the  post-Oiaitenger-accident  Redesigned  Solid  Rocket  Booster  (RSRB)  revealed  that 
similar,  although  smaller,  pressure  excursions  had  occurred  fairly  frequently  in  both  flight  and 
ground-test  motors.  A statistical  analysis  of  this  experience  led  NASA  to  conclude  that  the  pressure 
transient  was  well  within  the  envelope  of  the  experience  base  of  earlier  flights,  and  that  therefore 
the  next  scheduled  mission  (STS-55)  would  be  safe  to  fly.  While  STS-55  did  in  fact  fly 
successfully,  its  "A"  booster  experienced  a 13  psi*  pressure  excursion  at  approximately  72 
seconds.  This  repetition  added  urgency  to  the  need  to  understand  and,  if  necessary,  to  find  a way 
to  mitigate  the  pressure  transient  phenomenon. 

A number  of  candidate  mechanisms  for  generating  pressure  transients  have  been  postulated  and 
evaluated;  attempts  have  been  made  to  establish  upper  bounds  on  the  magnitude  of  the  associated 
thrust  excursions  through  a combination  of  statistical,  analytical,  and  empirical  methods;  ground 
tests  of  SRBs  with  special  instrumentation  for  the  pressure  transient  investigation  have  been 
conducted;  and  increasingly  refined  analyses  have  tom  performed  to  assess  the  effects  of  the 
upper-bound  thrust  on  structural  stress  margins  and  vehicle  dynamics.  The  study  described  in  this 
report  continues  this  work  by  bringing  a probabilistic  risk  assessment  perspective  to  the  SRB 
pressure  excursion  investigation. 


Note:  these  were  the  pressure  observations  initially  reported,  based  on  a 2-per-sccond  sampling  rate.  1 2. 5 -per- second  data 
that  became  available  later  showed  peak  excursions  up  to  IS  psi 
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1.2.  Objectives. 

The  general  objectives  of  the  SRB  Pressure  Excursion  Assessment  were  to  support  the  independent 
internal  review  of  the  SRB  pressure  excursion  phenomenon  chartered  by  the  NASA  Administrator 
by  providing  insights  into  the  risk  implications  of  the  pressure  excursion  situation,  to  prepare 
information  on  SRB  risks  that  will  be  needed  to  suppon  the  more-comprehensive  Space  Shuttle 
PRA  that  is  now  under  way,  and  to  demonstrate  the  benefits  of  probabilistic-risk-based  thinking 
processes  to  the  civil  space  enterprise. 


2*0.  Discussion  of  Project  Subtasks. 

2.1.  Subtask  1.  Information  Review  and  Risk  Framework  Deveiooment. 

In  Subtask  1 the  SAIC -Safety  Factor  Associates  (SFA)  team  obtained  and  reviewed  the  information 
furnished  by  the  Shuttle  program  to  the  NASA  independent  review  team  that  met  at  Marshall  Space 
Flight  Center  during  the  week  of  3 January  1994.  In  brief  summary,  this  data  set  contained 
briering  materials  from  pre-  and  post-flight  reviews  of  the  STS-55,  STS-57,  and  STS-58  missions; 
information  on  the  TEM-10  and  TEM-ll  ground  tests;  briefing  materials  and  responses  to 
questions  prepared  for  both  the  independent  internal  review  requested  by  the  Administrator  and  the 
external  (Faget  committee)  review;  and  a variety  of  background  information.  Together  with  the 
program's  answers  to  clarifying  questions,  this  information  gave  die  S AIC-SFA  team  a reasonably 
complete  and  detailed  understanding  of  the  process  and  results  of  the  SRB  pressure  excursion 
investigation. 

The  information  the  team  reviewed  does  not  — and  is  not  intended  to  — deal  with  the  pressure 
excursion  phenomenon  as  one  of  many  potential  contributors  to  total  Shuttle  accident  risk.  NASA 
and  its  contractors  quite  properly  focused  on  the  causes  and  effects  of  the  pressure  transient 
phenomenon  rather  than  its  top-level  risk  implications.  However,  understanding  die  relative 
contributions  of  potential  accident  initiators  to  total  risk  is  essential  to  making  sound  decisions 
concerning  the  allocation  of  scarce  resources  among  candidate  risk-reduction  approaches.  This  is 
one  of  the  key  reasons  for  performing  a PRA  on  the  Shuttle. 

The  S AIC-SFA  team  began  the  process  of  placing  the  SRB  pressure  excursion  data  within  a PRA 
risk  scenario  structure  by  developing  a preliminary  Master  Logic  Diagram  for  catastrophic  Shuttle 
accidents  during  the  mission  phase  in  question.  A Master  Logic  Diagram  (MLD)  is  a sperm 
logic  tree  that  identifies  all  of  the  credible  accident  initiating  events  that  lead  to  the  "top  event,"  but 
addresses  neither  pivotal  events  that  can  alter  the  progress  of  cause-effect  sequences  for  better  or 
worse,  nor  interactions  among  initiators  and  event  sequences,  nor  the  probabilities  of  the  initiating 
events.  (These  items  are  dealt  with  in  later  stages  of  the  analysis).  The  MLD  is  the  first  step  in 
constructing  accident  sequences  or  scenarios  that  can  then  be  analyzed  to  obtain  quantitative 
information  on  the  total  risk  and  the  relative  contributions  of  risk  factors. 

Appendix  1 contains  the  top-level  MLD  for  the  boost  phase  of  Shuttle  ascent,  showing  the  role  of 
SRB  pressure  and  thrust  transients  as  potential  initiators  of  Loss  of  Vehicle.  As  the  reader  will 
note,  these  are  the  only  initiators  that  are  called  out  specifically  on  this  preliminary  MLD;  the  other 
potential  initiators  are  left  undeveloped  (as  denoted  by  the  diamond-shaped  symbols),  and  will  be 
developed  later  during  the  main  Shuttle  PRA.  The  lower-level  branches  that  are  not  shown 
explicitly  in  Appendix  1 (denoted  by  triangular  off-page-connector  symbols  containing  numbers, 
e.g.,  &)  are  similar  to  the  analogous  branches  of  NASA's  "fault  tree"  for  the  pressure  excursion 
event  (which  is  itseif  actually  an  MLD,  as  we  note  below).  Appendix  2 contains  the  NASA  "fault 
tree." 
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2.2.  Subtask  2.  Correlation  of  Solid  Rocket  Booster  Pedigree  Information  fyjtfo 
Pressure  Excursions. 

Subtask  2 is  a correlation  analysis  that  searched  for  significant  relationships  between  the 
magnitude,  frequency,  and  variability  of  observed  SRB  chamber  pressure  transients,  and  the 
pedigree  and  history  of  the  SRBs  that  had  experienced  transients.  The  SAIC-SFA  team 
investigated  potential  correlations  between  the  following  factors  and  pressure  excursion  phenomena 
on  the  basis  of  the  information  furnished  by  NASA  and  its  SRB  contractors: 

♦ Casting  sequence  • Ammonium  perchlorate  (AP)  vendor 

♦ Firing  sequence  • Aluminum  powder  vendor 

♦ Storage  time  (interval  between  • SRB  TVC  gimballing  just  before  or  during 

casting  and  firing)  pressure  excursions. 

Combinations  of  several  factors  were  considered  in  some  cases. 

Figures  1 and  2 show  some  of  the  most  interesting  and  potentially  significant  results  of  this 
subtask.  Figure  1 is  a scatter  plot  of  peak  pressure  transient  magnitude  versus  casting  date  for 
SRBs  containing  ammonium  perchlorate  (AP)  from  the  three  vendors.  Pacific  Engineering  (PE), 
Kerr-McGee  (KM),  and  Western  Electrochemical  (WE,  successor  to  PE  after  the  PE  plant  was 
destroyed  in  an  accidental  explosion.)  Figure  1 clearly  shows  that  boosters  loaded  with  WE  AP 
exhibit  considerably  higher  pressure-transient  magnitudes  than  those  containing  other  venders'  AP, 
as  also  noted  in  a number  of  NASA  analyses.  (A  T-test,  a standard  statistical  test  of  significance, 
demonstrates  that  the  differences  among  vendors  are  statistically  significant  at  more  than  99% 
confidence.) 

Figures  2a  and  2b  on  page  5 are  plots  of  a five-booster  moving  average  of  recorded  pressure 
transient  peaks  versus  propellant  motor  identification  number  (arranged  in  order  of  casting  date)  for 
SRBs  containing  AP  from  KM  and  WE  respectively.  (Averaging  over  five  motors  highlights 
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Figure  1.  Scatter  Plot  of  Peak  Pressure  Transient  Versus  Casting 
Date  for  SRBs  Containing  Ammonium  Perchlorate  from  Three  Vendors. 


i 

■ WE 

■ 

ltu 

A 

a 

n KM 
APE 

■ 

■ 

n __ 

1 

I 1 

F ^ 

!■ 
A A 

A 

c 

a 

arf*  .q 

■ 

▲ 

m 

A 

a D 

— 
a a n_ 

U uw 
a 

a 

* D° 

“o 

■ 

A A D 

A A 

A 

o 

□ 

a ir 
a a. 

■iaii 

■■■ 

Lj 

m 

k4  * 

□ 

▲ 

at 

o*n 

’ - 8 
-JO □ 

- uJ 

3 


SAICNY93-01-1O 
Rev.  3 1/24/94 

trends  in  the  data  by  filtering  out  small  motor- to -motor  variations.)  Note  the  difference  in  trends 
between  the  two  plots.  Pressure  excursion  magnitudes  in  KM  SKBs  have  treaded  gradually 
upward,  and  seem  to  have  become  somewhat  more  variable  recently.  However,  WE  SRB  pressure 
excursions  were  trending  gradually  downward  until  they  showed  a sudden  and  sharp  increase 
beginning  at  motor  number  298.  This  suggests  that  a significant  change  occurred  in  some 
characteristic  that  affects  chamber  pressure  stability  at  that  point  It  is  not  yet  clear  whether  the 
change  involved  the  AP  material  itself,  its  processing  into  finished  SRBs,  the  treatment  of  the 
SRBs  between  manufacture  and  launching,  or  the  characteristics  of  the  flights  during  which  the 
excursions  occurred  (or  perhaps  some  combination  of  these). 


2.3.  Subtask  3.  Development  of  Parameter  Uncertainty  Distributions. 

It  is  clear  that  thrust  is  the  solid  rocket  booster  performance  parameter  of  greatest  flight-safety  risk 
significance,  at  least  in  the  present  context  of  risk  imposed  by  SRB  pressure  excursions.  Therefore 
the  SAIC-SFA  team  concentrated  on  developing  uncertainty  distributions  for  thrust  The  basis  of 
this  analysis  is  the  following  mission-specific  SRM  thrust  equation  that  has  been  presented  in 
several  of  the  briefing  packages  (e.g.  "MSFC  RSRM  Pressure  Blip  and  Dispersions,"  1 1/10/93, 
reproduced  in  Appendix  3 of  this  report),  and  that  is  apparently  used  to  compute  the  normal  and 
upper-bound  SRB  thrust  for  flight  certification  of  the  external  tank  (ED. 

F « F block  + ^Fburnratet"  AFpmst*  AFoscuean+  AF/mbmean  + M 

scale  FACTOR  + AFfiQM  + + A* SHAPE  + AFf/p  + AFqscDJSP 


Figure  2a.  Five-Motor  Moving  Average  of  Peak  Pressure  Transient  Versus 
Casting  Sequence  Number  for  SRBs  Containing  Kerr-McGee  Ammonium 

Perchlorate. 
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Figure  2b.  Five-Motor  Moving  Average  of  Peak  Pressure  Transient  Versus 
Casting  Sequence  Number  for  SRBs  Containing  Western  Electrochemical 

Ammonium  Perchlorate. 


(The  nomenclature  is  defined  in  Appendix  3.)  This  equation  appears  to  be  an  essentially  empirical 
relation  combining  no™™!  ("block")  thrust;  several  quasi-constant  terms  which  adjust  for  expected 
variations  from  nominal  thrust  due  to  propellant  temperature,  bum  rate  variability,  etc.;  and  terms 
reflecting  uncertainties  in  most  of  the  other  terms.  The  latter  set  of  terms  is  combined  by  using  die 
"root-of-the-sum-of-the-squaies"  (RSS)  method  into  a single  uncertainty  term  that  is  summed  with 
the  others . (The  SAIC-SFA  team  questions  the  appropriateness  of  die  RSS  method  in  this  case,  as 
discussed  in  paragraph  3.3  below,  but  we  will  reserve  that  issue  for  later.) 

In  order  to  develop  an  uncertainty  distribution  for  total  SRB  thrust,  the  uncertainty  terms  were 
represented  as  distributions  around  a mean,  and  grouped  with  the  terms  that  represent  their 
respective  means.  In  this  way  the  equation  is  restated  as... 

F - ( FbIjOCK  £ AFnOm)  + ( AF BURN  RATE  £ AF SCALE  FACTOR)  + ( AF PMBT  £ AF PMBT  UNC)  + 

( AFqsc  mfa  n £ AFqsc  pjsp)  + (0  £AF shape)  + (0  £ AFp/p) + AFimb  mean  (2) 


Consistently  with  NASA’s  practice,  and  in  the  absence  of  contrary  evidence,  the  distributions  on 
the  uncertainty  terms  were  assumed  to  be  normal  or  Gaussian.  The  standard  deviations  assigned  to 
the  distributions  depended  on  the  specific  circumstances.  This  equation  was  set  up  in  an  Excel 
4.0™  spreadsheet,  and  the  distributions  of  the  uncertain  terms  propagated  through  the  equation  to 
form  the  total  thrust  distribution  by  Monte  Carlo  simulation  using  Crystal  Ball™,  a commercial 
Monte  Carlo  simulation  tool  that  interfaces  directly  with  Excel.  Figures  4a  and  4b  in  paragraph  3.3 
show  outputs  for  several  simulation  cases,  and  the  accompanying  text  explains  their  significance. 
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3.0.  Key  Risk  Issues. 

The  SRB  Pressure  Excursion  Assessment  was  performed  partly  in  order  to  develop  risk-based 
insights  into  both  the  SRB  pressure  transient  phenomenon  and  the  Shuttle  flight  safety 
decision-making  process.  Accordingly  the  SAIC-SFA  team  identified  a number  of  key  risk  issues 
which  are  presented  below. 


3.1.  Fault  Tree  (Master  Logic  Diagram). 

Early  in  its  investigation,  NASA  prepared  what  was  characterized  as  a "fault  tree"  in  order  to 
systematically  identify  and  track  all  credible  potential  mechanisms  for  the  production  of  SRB 
pressure  transients.  This  tree  was  presented  in  "STS -54  RSRM-29  Cumber  Pressure  Observation 
Overview,"  4 February  1993,  and  is  reproduced  in  Appendix  2.  After  reviewing  the  "fault  tree,'1 
the  SAIC-SFA  team  concluded  that  while  it  is  not  really  a fault  tree  in  the  sense  in  which  that  term 
is  normally  used  in  the  risk  assessment  community,  it  is  is  fact  a reasonably  comprehensive  and 
well-founded  Master  Logic  Diagram  for  the  "top  event"  of  SRB  pressure  transients.  (As  discussed 
previously,  an  MLD  identifies  all  of  the  credible  events  that  can  lead  to  the  top  event,  but  ignores 
pivotal  events,  interactions  among  initiators  and  event  sequences,  and  event  probabilities.) 
Therefore  it  will  be  possible  to  transfer  much  of  the  basic-events  information  and  logic  from  the 
NASA  "fault  tree"  directly  into  the  MT  T>  for  the  main  Shuttle  Probabilistic  Risk  Assessment 


3.2.  Deciding  on  the  Acceptability  of  Pressure  Excursions  Based  on  Statistical 
Analysis  of  Pressure  Excursion  Exner|enC«- 

NASA  has  consistently  used  a statistical  analysis  of  the  experience  base  of  pressure  excursions 
observed  during  flight  and  ground  test  firings  of  SRBs  to  determine  what  pressure  transients  (and 
indirectly  what  thrust  transients)  are  considered  "normal"  and  thus  acceptable.  (See,  for  example, 
the  briefing  materials  reproduced  in  Appendix  4.)  In  essence,  the  procedure  is  to  fit  an  assumed 
Gaussian  probability  distribution  to  the  pressure  excursion  observations  to  dam,  and  take  the  upper 
bound  of  "normal”  pressure  excursions  to  be  the  mean  of  this  distribution  plus  a factor  k times  the 
standard  deviation,  where  k is  selected  to  assure  an  acceptably  low  probability  that  the  bound  will 
be  exceeded  at  an  acceptably  high  statistical  confidence  level  In  some  instances  k-3.0  is  used,  as 
in  standard  aerospace  practice,  while  in  others  k appears  to  have  been  selected  to  achieve  acceptable 
confidence.  Whether  3d'  or  “kef  is  used  is  irrelevant  to  the  point  at  hand. 

The  effect  of  this  approach  is  to  widen  the  envelope  of  pressure  excursions  that  are  considered 
normal  and  acceptable  every  time  a transient  occurs  that  significantly  exceeds  the  range  of  recent 
observations.  Figure  3 illustrates  this.  It  depicts  the  pressure  excursions  observed  on  SRBs 
loaded  with  WE  ammonium  perchlorate,  plotted  against  motor  identification  number  in  order  of 
casting  sequence.  For  each  SRB,  the  mean  and  the  3a bounds  of  a normal  distribution  fitted  to  the 
set  of  pressure  transients  observed  on  motors  up  to  and  including  the  motor  in  question  are  also 
plotted. 

Consider  the  example  of  boosters  29B  and  30 A,  which  flew  on  Missions  STS-54  and  STS-55 
respectively.  Just  before  STS-54,  the  3a  limit  was  approximately  14.5  psi.  When  the  STS-54 
observation  was  added  it  grew  to  about  16  psi.  This  was  taken  to  mean  that  the  13  psi  excursion 
on  STS-54,  while  unprecedentedly  large,  was  within  the  range  to  be  expected  considering  the 
experience  base,  and  therefore  was  not  a matter  of  serious  concern.  When  the  second  13  psi 
transient  on  STS-55  was  absorbed  into  the  experience  base,  the  3a  limit  rose  to  about  17.5  psi, 
implying  that  the  STS-55  transient  was  even  farther  from  the  outer  bound  based  on  experience  and 
thus  even  less  of  a concern  than  the  similar  transient  on  the  previous  mission. 
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3-sigma  bound 


This  approach  presents  three  problems.  First,  it  is  based  on  the  unstated  assumption  that  all 
pressure  excursions  are  part  of  a single  population  differing  only  in  magnitude.  However,  the  large 
positive  pressure  excursions  that  are  the  subject  of  this  study  appear  to  be  qualitatively  different 
from  the  minor  fluctuations  around  the  nominal  pressure  that  comprise  most  of  the  experience  base. 
This  implies  that  incidents  of  these  two  lands  are  not  part  of  the  same  population  and  should  not  be 
treated  statistically  as  if  they  were.  Second  and  more  generally,  the  approach  provides  a 
mechanism  for  safety  margins  to  be  gradually  eroded  through  a series  of  incremental  decisions 
without  a thorough  engineering  review  of  the  overall  risk  implications  of  each  decision.  Third,  it 
tends  to  mask  genuine  failure  precursors  by  making  them  appear  to  be  part  of  a continuum  of 
normal  experience.  (A  "failure  precursor"  is  any  observed  abnormal  condition  that  can  credibly 
lead  to  catastrophic  failure  if  it  occurs  again  with  somewhat  greater  severity  or  when  the  ability  of 
the  system  to  respond  to  it  is  impaired.) 


3.3.  Solid  Rocket  Motor  Thrust  Equation. 

As  mentioned  earlier,  the  dispersed  thrust  equation  NASA  uses  to  estimate  SRB  thrust  loading  for 
structural  and  dynamics  calculations  uses  the  root-sum-square  (RSS)  method  to  combine  the 
variabilities  of  the  thrust  components  that  are  subject  to  variability  into  a single  terra,  which  is  then 
summed  with  several  other  terms.  However,  the  validity  of  the  root-sum-square  (RSS)  method  of 
combining  variabilities  depends  on  the  variabilities'  being  random,  symmetrically  distributed,  and 
independent.  As  far  as  the  SAIC-SFA  team  can  determine,  none  of  these  conditions  is  necessarily 
satisfied  for  the  uncertainty  terms  of  the  SRB  thrust  equation  for  the  following  reasons.  First,  die 
sources  of  uncertainty  appear  to  contain  some  systematic  variations,  e.g.,  the  variation  of  thrust 
excursion  magnitude  and  frequency  with  AP  vendor,  and  therefore  the  variabilities  are  not 
necessarily  random.  Second,  the  sources  of  uncertainty  appear  to  arise  from  physical  causes  which 
may  not  necessarily  be  characterized  by  symmetrical  distributions.  Third,  several  of  the  uncertainty 
terms  appear  likely  to  be  correlated  rather  than  independent  Furthermore,  the  violations  of  the 
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conditions  for  using  RSS  are  non-conservative  in  most  cases.  It  seems  clear  that  the  RSS  method 
is  not  appropriate  for  this  case,  and  using  it  potentially  can  increase  risk  by  understating  the  upper 
bound  of  expected  thrust  and  thus  decreasing  structural  margins  of  safety. 

As  discussed  in  paragraph  2.3  above,  in  order  to  investigate  the  risk  implications  of  this  situation 
the  SAIC-SFA  team  formed  explicit  uncertainty  distributions  for  SRB  thrust  by  constructing 
uncertainty  distributions  for  the  variability  terms  of  the  thrust  equation  and  propagating  them 
through  a reformulated  version  of  the  thrust  equation  using  Monte  Carlo  simulation.  The  first, 
base-case  simulation  replicated  NASA’s  RSS  calculations  for  the  numbers  given  in  the  example  in 
Appendix  3,  taking  the  "A"  terms  to  be  the  3a  values  of  normal  distributions.  This  case 
demonstrates  that  the  RSS  method  gives  correct  results  if  the  necessary  conditions  for  its  use  are 
fulfilled.  The  team  then  investigated  the  impact  of  violating  the  conditions  by  running  several 
sensitivity  cases  in  which  distributions  that  were  (1)  constant  over  part  of  their  ranges  (hence  not 
random),  (2)  skewed  (hence  not  symmetrical),  and  (3)  mutually  correlated  (hence  not  independent) 
were  substituted  for  the  independent  Gaussian  distributions  of  AFSOM  and  AFS(Mt  FAaot,  the  two 
largest  variability  terms  in  the  original  simulation.  Comparing  the  resulting  thrust  distributions  with 
each  other  and  with  the  base  case  that  replicated  the  RSS  calculations  showed  that  non-randomness 
and  non-symmetry  of  the  distributions  had  very  little  effect  on  the  outcome,  at  least  with  the 
moderate  violations  assumed  in  this  study,  but  that  non-correlation  had  a substantial  impact  on  the 
critical  right-hand  “tail"  of  the  thrust  distribution.  Furthermore,  the  effect  of  using  the  RSS  method 
to  combine  correlated  variability  terras  is  always  non-conservative  (Le.,  resulting  in  lower  predicted 
thrust  than  the  Monte  Carlo  simulation  that  accounts  for  correlation).  These  topics  are  discussed  in 
detail  in  Appendix  5. 
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Figure  4a.  Dispersed  Thrust  Uncertainty  Distribution  for  the  Base 
Simulation  Case  Replicating  the  Example  That  Uses  the  RSS  Method. 


Figures  4a  and  4b  show  the  frequency  distributions  of  dispersed  SRB  thrust  for  the  base  case  and 
the  conelated-terms  case  respectively,  as  generated  by  the  Crystal  Ball  Monte  Carlo  simulation  tool. 
The  base  and  conelated-terms  distributions  in  these  figures  superficially  appear  similar,  but  Figures 
5a  and  5b  highlight  the  critical  difference  between  them  by  illustrating  how  the  non-conservative 
error  of  using  the  RSS  method  to  combine  compared  variabilities  can  affect  the  margin  of  safety  of 
the  critical  pans  of  the  external  tank  structure.  The  three  distributions  in  5a  and  5b  are  Gaussian 
distributions  plotted  from  the  parameters  given  by  three  Monte  Carlo  simulation  cases.  In  each 
figure  the  distribution  labeled  "uncorreiated"  was  derived  from  the  base  case  that  replicates  the  RSS 
version  of  the  thrust  equation  (Figure  4a);  the  "somewhat  correlated"  distribution  came  from  the 
case  shown  in  Figure  4b,  where  AFHOM  and  AFyoagAiawi  were  assumed  to  be  75%  correlated;  and 
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Figure  4b.  Dispersed  Thrust  Uncertainty  Distribution  for  the  Sensitivity 
Simulation  Case  in  which  AFK0M  and  AFSCAJJt  WCTOB  Are  Assumed  Correlated  at  75% 

Correlation  Factor. 


the  N 100%  correlated"  distribution  was  based  on  a case  in  which  all  uncertainty  terms  in  the  thrust 
equation  were  assumed  fully  correlated  with  one  another.  (The  latter  case  puts  an  upper  bound  on 
the  factor-of-safety  effect  to  be  expected  from  replacing  the  RSS  method  with  a more  rigorous 
method  of  propagating  uncertainties.)  Figure  5a  shows  the  thrust  distributions  on  a large  scale, 
while  Figure  5b  focuses  in  on  the  right-hand  "tails." 

Looking  first  at  Figure  5a,  note  that  — as  expected  — increasing  the  correlation  among  variability 
terms  increases  the  dispersion  of  the  thrust  distribution  and  thus  raises  the  3a  upper  bound  on 
thrust  The  rightmost  vertical  arrow  at  approximately  6.8x10*  lbs  in  Figure  5a  represents  the 
ultimate  failure-point  thrust  used  in  NASA's  example,  which  corresponds  to  a safety  factor  of  1.28 
applied  to  the  3a  point  of  the  base  RSS-derived  thrust  Also  shown  are  the  3a  (99.87%)  upper 
thrust  bounds  for  the  uncorreiated,  somewhat  correlated,  and  100%  correlated  cases.  Now  refer  to 
Figure  5b,  which  shows  the  right-hand  "tail*"  of  the  thrust  distributions  in  more  detail.  Note  that 
when  two  variability  terms  of  tee  thrust  equation  are  assumed  to  be  somewhat  correlated,  the  factor 
of  safety  drops  from  i.28  (the  minimum  requirement  in  the  example)  to  1.276.  In  the  worst  case  in 
which  all  variability  terms  are  assumed  100%  correlated,  the  factor  of  safety  is  only  1.217.  The  key 
point  here  is  that  if  the  minimum  acceptable  factor  of  safety  is  1.28  based  on  the  3a  value  of  the 
thrust,  and  the  thrust  calculated  by  the  RSS  method  barely  satisfies  this  requirement,  then  the  thrust 
calculated  by  a method  such  as  Monte  Carlo  simulation  that  correctly  accounts  for  correlations 
among  sources  of  variability  provides  a negative  margin  of  safety. 

In  addition  to  tee  inappropriateness  of  the  RSS  method,  the  SAIC-SFA  team  has  serious  concerns 
about  the  validity  of  the  method  used  to  establish  the  3a  upper  bound  for  the  term  in  the 

thrust  equation.  NASA  appears  to  have  performed  a statistical  analysis  of  66  previous  RSRM 
pressure  traces  to  derive  a 3a  upper  bound  for  future  pressure  spikes.  As  best  tee  team  can 
reconstruct,  tee  following  procedure  was  followed: 

1.  The  sample  population  pressure  traces  were  divided  into  one-second  increments. 

2.  A normal  distribution  was  assumed  for  tee  pressure  distribution  over  66  motors  at  each  time 
increment. 

3.  A mean  and  standard  deviation  (cr)  were  obtained  at  each  pressure  increment 
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SRB  fetal  thrust,  ft 

Figure  5b.  Right-Hand  “Tails"  of  the  Correlated  and 
Unconeiated  Thrust  Distributions,  with  Corresponding  Factors  of  Safety 
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4.  The  maximum  3a  value  occurred  at  time  69  seconds.  This  was  20  psi  above  the  mean. 

5.  It  was  assumed  that  this  20  psi  excursion  could  be  generated  at  any  time  increment 

6.  The  ratio  of  the  3a  value  to  the  sample  mean  at  each  time  increment  was  calculated  and  plotted 
as  a percentage. 

7.  At  69  and  71  seconds  this  ratio  was  about  3.2%.  This  was  converted  to  thrust  (about  80,000 
lb)  and  was  used  in  the  empirical  thrust  equation  as  the  AF^^g  term  discussed  above. 

The  SAIC-SFA  team  performed  an  independent  statistical  assessment  assuming  a normal 
distribution  at  69  and  71  seconds  using  the  pressure  plots  found  in  the  review  material  (reproduced 
in  Appendix  6).  The  mean  values  were  found  to  be  632  and  634  psi  respectively,  which  correspond 
to  the  plotted  mean  values  from  the  program.  The  standard  deviation  at  69  seconds  was  found  to  be 
a(69)  = 4.6  psi.  The  standard  deviation  at  71  seconds  was  found  to  be  a( 71)  = 4.4  psi.  Combining 
both  populations  provided  a a = 4.5.  The  20  psi  "upper  bound"  pressure  transient  used  by  the 
program  corresponds  to  about  4.4o,  not  3a.  There  is  no  apparent  explanation  for  this  discrepancy; 
perhaps  a normal  distribution  was  not  used  (although  it  was  staled  that  a normal  was  used). 

Furthermore,  if  a 20  psi  transient  is  a 4.4a  event,  then  a 13  psi  excursion  is  approximately  a 3a 
event  (assuming  a normal  distribution  was  in  fact  used),  which  implies  that  its  frequency  is 
approximately  1.4xl0-3  per  firing,  or  less  than  one  in  700  firings.  This  appears  incompatible  with 
the  observed  experience  of  two  13  psi  excursions  in  123  flight  and  test  firings  of  the  HPM  and 
RSRM  generations  of  the  Shuttle  SRB. 

Finally,  the  NASA  analysis  divided  the  population  into  one-second  increments.  This  implies  that 
each  rime  increment  was  considered  an  independent  population.  This  assumption  is  very  difficult  to 
justify.  The  data  shows  that  the  time  to  each  pressure  transient  is  nearly  random  in  the  time  interval 
64  to  80  seconds,  which  suggests  that  all  data  within  at  least  that  time  interval  should  be  combined. 
Furthermore,  phenomenological  investigations  indicate  good  reasons  for  the  slag/slosh  scenario  to 
produce  transients  during  tins  interval,  but  independently  of  time  during  the  interval.  The  reasons 
stem  from  propellant  bum  patterns  that  begin  to  allow  slag  to  collect  in  the  bore  or  nozzle 
beginning  at  about  65  seconds,  as  well  as  considerations  of  pitch  and  gimballing  that  provide  a 
mechanism  for  spilling  the  slag.  Again  the  team  sees  no  reason  to  believe  that  each  time  increment 
is  an  independent  population.  It  is  likely  that  a statistical  study  that  combines  the  data  over  the  64  to 
80-second  interval  would  be  valid  and  would  produce  a larger  3a  "upper  bound." 


3.4.  Handline  of  External  Tank  Structural  Safetv  Factors. 

NASA's  current  method  of  determining  the  required  safety  factor  (SF)  for  limits  on  external  tank 
(ET)  structural  loads  involves  scaling  the  SF  between  1.40  and  1.25  according  to  the  proportion  of 
the  total  load  that  is  "not  well  understood"  (i.e.,  highly  uncertain)  versus  "well  understood”  (i.e., 
relatively  certain.)  (Refer  to  the  briefing  materials  in  Appendix  7 for  an  explanation  of  the 
procedure.)  However,  the  NASA  method  appears  to  proportion  the  safety  factors  according  to  the 
magnitude  of  the  expected  load,  not  to  the  uncertainty  of  the  load,  although  the  SF  is  intended  to 
account  for  die  variability  above  the  expected  load  rather  than  its  magnitude.  It  seems  clear  that  if 
SFs  are  to  be  scaled  by  some  general  rule  related  to  load  uncertainty,  they  should  be  proportioned 
according  to  an  appropriate  uncertainty  measure  — perhaps  standard  deviation  or  variance  — 
instead  of  load  magnitude. 
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4.0,  Conclusions  and  Recommendations. 


This  section  contains  the  conclusions  and  recommendations  of  the  SRB  Pressure  Excursion 
Assessment  It  must  be  emphasized  that  they  came  out  of  a quick-response  analysis  driven  by 
urgent  Shuttle  flight  schedule  considerations.  Some  of  them  may  be  modified  by  a more 
comprehensive  and  systematic  risk  analysis  such  as  the  main  Shuttle  Probabilistic  Risk  Assessment 
of  which  this  study  is  a preliminary  part 


4.1  Conclnsions. 

1.  The  SRB  pressure  excursion  phenomenon  increases  Shuttle  flight  safety  risk  to  some  degree  by 
potentially  initiating  at  least  the  accident  scenarios  listed  below. 

(a)  A transient  over-thrust  in  one  or  both  SRBs  which  exceeds  the  structural  capabilities  of  the 
external  tank  causes  vehicle  breakup. 

(b)  A severe  transient  thrust  imbalance  between  the  two  SRBs  that  exceeds  the  structural 
capabilities  of  the  external  tank  causes  vehicle  breakup. 

(c)  A severe  transient  thrust  imbalance  between  the  two  SRBs  that  is  not  recoverable  by  flight 
controls  results  in  an  unacceptable  flight  attitude,  causing  vehicle  breakup  due  to  excessive 
aerodynamic  forces. 


(d)  A severe,  sustained  transient  thrust  imbalance  between  the  two  SRBs  that  is  not  recoverable 
by  flight  controls  results  in  loss  of  directional  control,  exceedance  of  range  safety  guidelines,  and 
flight  termination  by  the  range  safety  officer. 


(e)  A severe  chamber  pressure  transient  induces  a hot-gas  leak  at  an  SRB  joint  that  impinges  on 
the  fci  , causing  an  ET  explosion. 

(f)  A severe  chamber  pressure  transient  ruptures  the  SRB  case. 

2.  It  is  impossible  to  quantify  the  risks  of  these  scenarios  with  the  limited  information  available  to 
the  SAIC-SFA  team  in  Task  1.  (The  main  Shuttle  probabilistic  risk  assessment  is  intended  to 
accomplish  this.)  However,  scenarios  (c)  through  (f)  appear  to  be  of  negligible  probability,  at 
least  to  the  extent  that  they  are  initiated  by  SRB  pressure  excursions,  chiefly  because  it  is  difficult  to 
conceive  of  a mechanism  for  producing  thrust  or  pressure  excursions  of  the  necessary  magnitude 
and  duration. 


3.  There  is  a statisticaliy-significant  correlation  between  the  use  of  ammonium  perchlorate  supplied 
by  Western  Electrochemical  (WE)  in  SRB  solid  fuel,  and  die  frequency  of  large,  positive  pressure 
transients.  The  SAIC-SFA  team  could  not  draw  any  conclusions  about  the  reason(s)  for  this 
correlation  from  the  data  available  to  us. 


4.  Trending  of  peak  pressure  excursions  against  the  SRB  casting  sequence  suggests  that  an  abrupt 
change  in  some  characteristic  of  motors  containing  WE  ammonium  perchlorate  that  affects  internal 
pressure  occurred  at  motor  number  29B.  The  available  data  do  not  support  any  conclusions  as  to 
what  this  change  might  have  been. 


S.  Based  on  the  material  provided  for  review,  the  SAIC-SFA  team  has  conceptual  and  technical 
concerns  about  NASA’s  methodology  in  these  four  specific  areas: 
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(a)  treating  all  pressure  excursions  in  the  SRB  experience  base  as  a single  population  for  the 
purpose  of  statistical  analysis  in  order  to  determine  what  pressure  transients  (and  indirectly  what 
thrust  transients)  are  considered  "normal”  and  thus  acceptable,  although  the  large  positive  pressure 
excursions  that  are  the  subject  of  this  study  appear  to  be  qualitatively  different  from  the  minor 
fluctuations  around  the  nominal  pressure  that  comprise  most  of  the  experience  base; 


(b)  using  of  the  "root-of-the-sum-of-the-squares"  (RSS)  method  to  combine  the  variabilities  of 
the  terras  of  the  SRB  dispersed  thrust  equation  that  account  for  uncertainties  in  thrust,  although 
there  is  considerable  doubt  that  the  necessary  conditions  for  the  validity  of  that  method  are  fulfilled: 


(c)  dividing  the  SRB  pressure  trace  experience  base  into  one-second  time  increments  which  were 
analyzed  separately,  which  implies  that  the  pressure  traces  in  these  increments  comprise  separate 
populations,  although  both  historical  data  and  phenomenology  suggest  that  the  set  of  pressure 
traces  within  the  interval  when  pressure  transients  occur  is  part  of  a common  population;  and 


(d)  establishing  the  minimum  structural  safety  factor  for  the  external  tank  by  scaling  the  SF 
between  1.40  and  1.25  according  to  the  proportion  of  the  total  load  magnitude  that  is  "not  well 
understood'1  (i.e.,  highly  uncertain)  versus  "well  understood"  (i.e.,  relatively  certain),  rather  than 
arenrHing  to  a quantitative  measure  of  the  uncertainty  of  these  categories  of  loads. 

All  of  these  problems  can  potentially  lead  to  non-conservative  assessments  of  safety  and  hence  to 
increases  in  Shuttle  flight  risk. 


6.  More  generally,  the  team  had  concerns  with  the  flight  safety  decision  process  as  depicted  in  the 
review  materials.  NASA  appears  to  have  used  a "3<r"  or  “kc"  envelope  derived  by  fitting  an 
assumed  Gaussian  distribution  to  the  record  of  pressure  observations  in  order  to  define  the  limits  of 
"normal"  and  thus  acceptable  SRB  pressure  transients.  (The  SAIC-SFA  team’s  experience 
suggests  that  this  is  a common  practice  that  is  not  restricted  to  the  SRB  pressure  excursion  issue.) 
The  problem  with  this  approach  is  that  each  anomalous  occurrence  becomes  part  of  the  experience 
base  and  thus  widens  the  range  of  behavior  considered  normal,  which  can  mask  genuine  failure 
precursors  by  making  them  appear  to  be  part  of  a continuum  of  normal  experience.  Making  flight 
safety  decisions  on  this  basis  provides  a mechanism  for  safety  margins  to  be  gradually  eroded 
through  a series  of  incremental  decisions  without  a thorough  engineering  review  of  the  overall  risk 
implications  of  each  decision. 


7.  Still  more  generally,  while  NASA  and  its  contractors  have  done  an  excellent  root  cause  analysis 
of  the  SRB  pressure  transient  phenomenon,  with  the  wisdom  of  hindsight  the  issue  seems  to  have 
been  handled  in  a somewhat  disorganized,  ad  hoc  fashion  that  was  driven  largely  by  the  need  to 
make  timely  flight  readiness  decisions  in  the  absence  of  complete  information.  The  SAIC-SFA 
believes  that  much  of  the  disorganization  could  have  been  avoided  if  the  Shuttle  program  had 
been  able  to  take  advantage  of  a flight-safety  decision  process  based  on  a systematic,  quantitative 
consideration  of  risk. 


4.1.  Recommendations. 

1.  NASA  should  consider  the  conceptual  and  technical  concerns  raised  in  Section  3.0,  "Key  Risk 
Issues,"  some  of  which  appear  to  be  generic  to  the  agency  and  its  contractors.  Specifically,  the 
SAIC-SFA  team  recommends  that  NASA  consider  the  following  changes  in  its  current  practices  as 
described  in  the  data  furnished  for  the  SRB  Pressure  Excursion  Assessment: 


(a)  NASA  should  reformulate  the  dispersed  thrust  equation  that  is  used  to  determine  SRB  thrust 
loadings  on  the  external  tank  structure  in  a way  that  avoids  using  the  RSS  method  unless  that 
method  is  rigorously  shown  to  be  valid,  and  fully  accounts  for  the  observed  and  potential  actually 
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occurring  pressure  transients. 

(b)  NASA  should  perform  a statistical  analysis  of  historical  SRB  pressure  data  that  uses 
12.5-samples-per-second  data  in  lieu  of  2-samples-per-second  Hara  and  treats  the  data  in  the  64  to 
80  second  time  interval  as  a single  population. 

(c)  At  a minimum,  NASA  should  revise  its  method  of  determining  minimum  structural  safety 
factors  for  the  external  tank  so  as  to  apportion  safety  factors  according  to  the  ratio  of  uncertainties 
in  the  '' well -understood'1  and  "not-well-understood"  load  categories,  rather  than  according  to  the 
magnitudes  of  the  expected  loads. 

(d)  Better  still,  in  view  of  the  progress  in  our  understanding  of  probabilistic  structural  mechanics 
and  the  development  of  powerful  probabilistic  structural  analysis  tools  since  the  inception  of  the 
Shuttle  program,  NASA  should  abandon  the  safety  factor  concept  in  favor  of  rigorous 
structure-by-structure  probabilistic  structural  analysis  as  a basis  for  Shuttle  flight  certification.  This 
recommendation  will  become  especially  important  if  — as  seems  lively  — the  external  tank  is 
further  lightened  by  cutting  back  on  structural  margins  or  the  Shuttle  is  called  on  to  fly  more 
structurally-demanding  trajectories. 

2.  Because  ET  structural  failure  appears  to  be  the  dominant  mechanism  of  potential  Shuttle  loss  due 
to  SRB  chamber  pressure  excursions,  and  SRB  thrust  rather  than  cumber  pressure  is  the  direct 
driver  of  structural  failure,  NASA  should  consider  installing  high-fidelity  force  (thrust) 
instrumentation  on  the  forward  attachments  between  the  SRBs  and  the  ET  for  the  next  few  flights 
in  order  to  better  characterize  the  thrust  transient  phenomenon. 

3.  In  view  of  the  conclusions  above,  NASA  should  proceed  expeditiously  with  its  planned 
comprehensive  probabilistic  risk  assessment  of  the  Shuttle  system.  This  study  will  determine  how 
SRB  pressure  transients  rank  relative  to  other  risk  contributors,  and  thus  whether  continuing 
expensive  and  time-consuming  efforts  to  investigate  than  is  a good  investment  of  limited  resources; 
more  generally,  it  will  lay  a sound  foundation  for  a quantitative  risk-based  flight  readiness  decision 
process  for  the  future. 
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Appendix  1. 

Preliminary  Top-Level  Master  Logic  Diagram  for  Loss 
of  Shuttle  Vehicle  during  Shuttle  Boost- Phase  Ascent, 
Highlighting  SRB  Pressure  and  Thrust  Transients  as 
Accident  Sequence  Initiators. 
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Appendix  2. 

NASA  "Fault  Tree"  (Master  Logic  Diagram)  for  Pressure 
Excursions  (excerpt  from  "STS-54  RSRM-29  Chamber 
Pressure  Observation  Overview,"  4 February  1993 
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Appendix  3. 

SRM  Dispersed  Thrust  Equation  and 
Example  of  Thrust  Calculation 
(excerpt  from  "MSFC  RSRM  Pressure  Blip 
and  Dispersions,”  10  November  1993) 
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Appendix  4. 

Examples  of  Use  of  Statistical  Analysis  of  SRB 
Pressure  Transient  History  in  Flight  Safety  Decisions 


Excerpts  from: 

• "STS-54  Pressure  Perturbation  Investigation  PRCB  Presentation/'  4 
February  1993 

* "In-Flight  Anomaly  Summary"  for  STS-54  Right  RSRM  Chamber 
Pressure  Spike 
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in-flight  anomaly  summary 

(CONTINUATION  SHEET* 


12. 


INVESHGATtON  SUMMARY  fcanttnueai:  F.SRM-29B  occurrence  ,s  very  low.  This  sugqests  that 
either  me  castaoie  inruoitor  comae  on  in  large  oieces  or  mat  a ccmoinea  scenario  iprooeuanr 
saw  wm  Core  oiocxagei  is  recuirea  pant 

In  order  to  bouno  this  scenario  a conservative  aoproacn  was  aeveiooea.  where  it  was  assumed 
t trie  nigner  oressure  oerturoation  on  RSRM-298  was  oue  to  a soeciai  cause  or  variation.  To 
arrive  at  an  uooer  limit  thrust  imbalance  tor  tms  scenano.  tne  following  thrust  imbalance  data 


• Casxaoie  inruoitor  loss  exposing  tne  oroDeiiant 

• Sore  oiockage 


74  Mb 

too  klb 

RSS  * 1 24  klb 


his.cenano  is  ccunaec  ov  me  amount  ct  casta  Die  inmoitor  tnat  can  os  excelled  out  of  the 

„ „®ana  m®  rna-*',mL'm  oroDeiiant  unconaea  wnicn  aaas  surtace  area  burning  witn  niqher 
pressure  in  aaaition  to  .-esrnctmg  tne  core.  y 


PROBLEM  SCLUTC 
following  reasons: 


*N:  in 


general,  the  occurrence  of  the  oressure  soike  is  not  a concern  forth 


i) 


IrJwwT  SF,M  MPM'  FWC’  ana  flSRM  motor  pressure  traces  show  that  eacn  type  of 
exn,DRed  Pressure  variations  after  50  seconos  from  ignition.  The  measured 

nhaiuui.  8S  'n  tirne  an0  rna5n,tlK3e-  However,  ail  pressure  traces  show  the  ’blip* 
pneromenon  ana  have  remained  within  soecification  limns.  AJl  Right  motor  oressure 

™10"5  greater  man  * psi  nave  occurred  in  the  65-  to  75-second  burn  time  range. 
ri«wf  t,«WMtl°ns  are  ct1ara«enstic  of  the  motor  ballistics  ana  are  exhibited  over  all  the 
oracess  ana  venoor  variations.  Pressure  perturbations  are  observed  with 
a _ * ana  ^ cona,t,Dns  ano  hav®  existed  on  Right  ana  static  test  motors  with  and 

3 vectonn9  aurv  cycles.  Assessment  of  flight  history  snows  tnat  tne  freauency 

timetramT  06  °T  Dressur®  oerturoatlons  tncreasea  suoseauem  to  me  3TS-35  (RSRM-li) 


^ showing  ZC*.  0<  motors  exfl,biting  olios  compare  favorabiv  to  static  test  motors. 
RSRM  la  ^QrrfaI,on-  AJI  m9ht  moTors  are  within  tne  expected  oressure  range. 
RSHM-29B  was  wrtnm  family  of  RSRM  history  ana  met  atl  performance  requirements 

31  and  associatea  inspection  oata  found  no  shifts  or  trends  that  could 

contribute  to  pressure  perturbations. 

4>  Out  magn,ruaa  oressure  olios  nas  increased  in  frequency  since  RSRM-14 

Out  of  33  motors.  18  have  exmbttea  this  condition. 
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IN-FLIGHT  ANOMALY  SUMMARY 

(CONTINUATION  SHEET) 


13.  PROBLEM  SOLUTION  (continued: 

5)  A statistical  anaivsis  of  a conservative  3 sigma  event  (RSRM-296  was  2.3  sigma)  shows  tnat 
the  maximum  potential  Dressure  oemiroation  is  18.6  psi.  The  thrust  imbalance  associated 
wrtn  that  oenuroation  is  75  klb.  The  prooabiitty  ot  one  SRB  having  a spike  event  is 
1 .35  x ur  or  one  m 740  motors.  The  probability  ot  a pressure  soike  occumng  on  both  SRBs 
of  a flight  set  is  i.8  x itX°  or  one  m 550.000  flights.  The  probability  of  a Dressure  soike 
occurring  on  ootn  motors  at  the  same  time  is  1.8  x 1CT®  or  one  in  55.000.000  flights. 


AH  R5RM  process  ana  matenais  cnanges  and  variations  that  could  contribute  to  the  observed 
Dressure  oerturcatians  are  unaer  review  Additional  tests  ano  analysis  efforts  nave  been  initiated 
10  understand  material  ana  Drocess  oarameters  tnat  influence  tne  generation  ot  Dressure 
oertutoarions. 

• Material  ana  process  cnaracterization  tests  on  tne  castaDle  innibitor 

• Additional  TEM-tO  instrumentation 

• Peal  Time  raoiograpny  {RTR} 

• infrared  cameras 

• Additional  accelerometers 

• Additional  strain  gages 

• High  soeea  cameras 

• Cold  flow  tests  ana  eomouianonai  fluid  dynamics  analyses 

The  results  of  tnese  tests  and  studies  will  be  used  to  define  process  or  material  corrective 
actions  tnat  couia  reouce  pressure  trace  roughness. 

The  prootem  report  nas  been  oeferrea  basea  on  the  following  conclusions: 

• it  is  concluded  that  pressure  perturbations  are  a general  characteristic  of  motor  naiiietw 
Right  performance  nas  not  violated  reamremems  tor  the  RSRM  Program. 

• A review  of  the  build  records  ot  alt  loaded  RSRM  flight  motors  found  no  uniaue  design, 
malarial,  or  fabncatton  history  tnat  can  be  correlated  with  producing  pressure  penurPat:  ns.  • 
Ail  motors  are  oreoicted  to  meet  flight  reauiremems. 
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f-ROBUEM  SOLUTION  fccrttinueai; 

* most  DraDaQte  s<:enano-  3 conservative  3 sigma  tnrust  imoalance  uooer  bound  is 

< 5 Mb.  The  snurue  system  is  capaoie  of  aceeottng  a 75  kio  tnrust  imbalance. 

•Reference  PRCED  No.  S052158Q 

1^®  ^as  ciasea  on  03-19-93  with  signatures  obtained  outside  the  aoaro  on  PRC8D  No. 
-044892E.  The  oraotem  reoon  nas  been  ctosea  for  the  next  three  nights  or  six  monms 
*rochever  comes  first.  Deferred. 
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Appendix  5. 

Comparison  of  Methods  for  Calculating  the  Effect  of 
Pressure  Perturbations  on  SRB  Thrust 
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Comparison  of  methods  for  calculating  the  effect  of  pressure 
perturbations  on  SRB  thrust 

SUMMARY: 

1.  The  NASA  RSS  solution  to  the  SRB  thrust  equation  is  NOT  conservative  if 
the  sources  of  variation  in  SRB  thrust  are  correlated. 

2.  The  RSS  method  of  solution,  which  assumes  symmetric  distributions,  is  more 
conservative  than  propagating  skewed  (e.g.:  lognormal)  probability  distributions  for 
the  existing  pressure  spike  data. 

3.  The  RSS  example  (based  on  2 sample  / second  data)  produces  a higher  (more 
conservative)  total  thrust  than  is  indicated  by  analysis  of  the  raw  12.5  sample  / 
second  data. 

4.  The  correlation  of  maximum  pressure  peaks  between  left  and  right  motors  is 
more  likely  due  to  normal  inter*  and  intra-  motor  pressure  variations  than  to 
pressure  spike  variations  associated  with  slag  “sloshing”  and  ejection. 

NASA  uses  a Root-Sum-Squares  (RSS)  method  to  combine  uncertainties  in  die  terms  of 
the  SRB  thrust  equation  NASA  to  determine  the  upper  bound  of  thrust  for  calculating 
Factors  of  Safety.  Two  of  the  key  assumptions  in  the  RSS  approach  are  independent 
sources  of  variation  and  symmetrically  distributed  variations.  The  extent  to  which  these 
assumptions  are  not  met,  and  the  impact  of  not  meeting  them  were  examined  by  solving 
the  SRB  thrust  equation  by  propagating  uncertainty  distributions  (in  Monte  Carlo 
simulation). 

The  NASA  RSS  solution  is  NOT  conservative  if  the  sources  of  variation  are 
correlated  (not  independent).  The  assumed  factor  of  safety  for  the  SRB  thrust  equation 
example  (RSS)  provided  by  NASA  was  1.280.  Using  the  same  data  but  assuming  a 
reasonable  correlation  between  two  terms  of  the  SRB  thrust  equation  resulted  in  a 
calculated  factor  of  safety  of  1.276.  In  die  limiting  case  of  all  terms  perfectly  correlated 
the  calculated  factor  of  safety  is  1.217. 

The  assumption  of  symmetrically  (normally)  distributed  pressure  spike  variations  resulted 
in  a more  conservative  (higher)  upper  bound  on  thrust  than  the  alternative  lognormal 
distributions  developed  by  F.  Safie  (MSFC)  or  those  developed  by  S AIC  for  this  analysis. 
In  general,  assuming  a skewed  distribution  for  the  pressure  spikes  (blips)  results  in  a 
slightly  asymmetric  total  thrust  distribution  with  a higher  mean  but  a smaller  99.87%  (one- 
sided upper)  certainty  bound  than  the  normal  distribution  implied  by  the  RSS  solution. 
Since  the  factor  of  safety  calculation  is  based  on  die  99.87%  certainty  bound,  the 
asymmetric  solutions  result  in  a higher  factor  of  safety  than  the  symmetric  (RSS) 
assumption.  For  the  various  distributions  examined  here,  the  NASA  RSS  method  is 
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therefore  more  conservative  than  propagating  skewed  distributions  which  better 
reflect  the  actual  distribution  of  the  pressure  spike  data. 

SAIC  analyzed  the  12.5  sample  per  second  SRB  pressure  data  by  separating  the  inter- 
motor variations  (mean  pressure  variations  from  motor  to  motor),  the  nominal  intra-motor 
variations  (normally  distributed  variations  in  pressure  within  each  motor  of  relatively  low 
amplitude),  and  the  pressure  spikE  variations  (strictly  positive  relatively  high  amplitude 
variations  above  the  nominal  intra-motor  population)  for  die  66  to  76  second  period  of 
interest  Straightforward  analysis  of  this  data  indicated  that  the  NASA  RSS  example 
(based  on  2 sample  / second  data)  produces  a higher  (more  conservative)  total  thrust 
than  js  Indicated  by  the  raw  data. 

This  analysis  also  found  that  there  is  a significant  correlation  in  both  inter-  (0.42)  and 
incra-  (0.68)  motor  variation  between  the  left  and  right  motors,  but  little  correlation  in  the 
pressure  spike  variations  (0.265)  between  ieft  and  right  motors.  It  has  been  noted  that  6 
of  the  8 highest  maximum  pressure  peaks  occurred  in  the  left  and  right  motors  on  3 flights. 
This  lead  to  speculation  that  slag  accumulation  and  ejection  (the  postulated  cause  for  the 
high  pressure  spikes)  may  be  related  to  flight  dynamics  or  other  mission-specific 
characteristics.  The  relatively  low  correlation  between  left  and  right  motor  pressure  spike 
populations  suggests  that  the  correlation  between  erf  maT  pressure  peaks  between  left 
and  right  motors  is  more  likely  due  to  inter-  and  intra-  motor  pressure  variations 
than  to  pressure  spike  variations  associated  with  slag  ejection. 


Discussion: 


Most  analyses  of  the  SRB  pressure  spike  phenomenon  have  focused  on  the  effect  of 
pressure  spikes  (blips)  on  SRB  thrust,  and  the  resulting  change  in  the  static  load  factor  of 
safety.  The  static  load  factor  of  safety  (FOS)  is  defined  as  the  load  at  which  the  structure 
is  expected  to  fail  divided  by  the  maximum  plausible  load  to  which  the  structure  will  be 
subjected.  Determining  the  maximum  plausible  total  SRB  thrust  is  the  essential  element  of 
these  analyses.  This  analysis  compares  the  current  (RSS)  method  of  determining 
maximum  plausible  thrust  to  the  fully  probabilistic  method  of  adding  distributions  in 
simulation. 

The  current  method  combines  the  sources  of  SRB  thrust  variation  by  adding  the  square- 
root  of  the  sum  of  the  squares  (root-sum-square  — RSS)  of  maximum  plausible  variations 
to  the  nominal  thrust  to  find  the  maximum  plausible  thrust  It  has  been  pointed  out 
elsewhere  that  the  underlying  assumptions  of  the  RSS  process,  notably  die  independence, 
svmmetrv.  and  equal  probability  of  the  variations,  may  have  been  violated.  This  analysis 
shows  how  the  method  of  propagating  uncertainty  distributions  can  readily  accommodate 
the  violation  of  those  assumptions,  and  illustrates  the  impact  of  these  violations  on  the 
computed  factor  of  safety. 
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Objectives: 

This  analysis  has  four  objectives:  (1)  Illustrate  the  method  of  propagating  uncertainty 
distributions  and  show  its  equivalence  to  the  RSS  method  in  the  limit  that  RSS 
assumptions  are  valid.  (2)  Show  how  a violation  of  the  RSS  independence  assumption 
affects  the  calculated  factor  of  safety.  (3)  Determine  whether  there  is  any  significant 
change  in  computed  factor  of  safety  when  the  underlying  distributions  in  the  thrust 
equation  are  not  symmetric.  (4)  Examine  the  12.5  sample  / second  data  provided  by 
NASA  to  determine  whether  there  is  any  significant  change  in  computed  factor  of  safety 
compared  to  the  thrust  eqnafion  solutions. 

Analysis  and  Results: 

Objective  1: 

Using  the  values  in  the  example  provided  by  NASA  (Table  1),  the  RSS  form  of  the  thrust 
equation  (Equation  1)  yields  5,309,000  lbf  as  the  3-cr  upper  bound  of  total  SRB  thrust  A 
1.28  factor  of  safety  implies  that  the  ultimate  load  (nominal  failure  point  of  the  structure) 
is  equivalent  to  6,795,520  lbf  thrust  This  retarionship  is  shown  graphically  in  Figure  1. 

Equation  1.  RSS  form  of  the  SRB  thrust  equation: 

Filtyh  — FbJai  + AFbmtato  + AFpMBT  + AF^~~_  + 

(AFocn"  + AFjcJo2  + AFpMBT  UDC  2 + AF«oc  + AFj.-*"  + AFpp2)^ 

Flow  = Fwock  + 4*  AFpMBT  + AFotcun  + AFmknsm  + 

(AFaon"  + AF*^,2  + AFpmbt  UDC  2 + AFo«;«oc“  + AF*,*2  + AFpp2)^ 


Ftoui  — Ftigh+Fiow 

Note:  Adding  the  3-sigma  upper  bound  values  of  R^  and  Flow  results  in  an  Ft««i  upper 
bound  significantly  higher  than  the  3-sigma  upper  bound  on  F^  (for  F^  and 
uncorrelated).  It  is  equivalent  to  assuming  that  the  variations  in  the  high  motor  are 
perfectly  correlated  with  the  variations  in  the  low  motor.  Since  the  uncertainty  among 
terms  for  each  SRB  are  treated  as  uncorrelated  this  may  have  been  inadvertent,  but  it 
results  in  a very  conservative  estimate  of  total  SRB  maximum  plausible  thrust  as  shown  in 
Table  1.  In  the  example  calculation  provided  by  NASA  it  is  noted:  “EXAMPLE  IS  FOR 
ILLUSTRATIVE  PURPOSES  ONLY.  THE  ACTUAL  LOADS  CALCULATION 
METHODOLOGY  IS  MUCH  MORE  INVOLVED”.  If  the  “actual  loads  methodology 
calculation”  differs  significantly  from  the  example,  in  particular,  if  the  actual  methodology 
does  not  simply  add  the  upper  bounds  on  high  and  low  thrust  to  determine  the  upper 
bound  on  total  thrust,  then  statements  made  in  this  analysis  regarding  the  relative 
conservatism  of  the  RSS  solution  are  invalid.  Except  where  explicitly  noted,  all  of  the 
distributions  shown  in  this  analysis  retain  the  conservative  assumption  of  correlation 
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between  the  high  and  low  motors  in  order  to  keep  the  results  comparable  with  the  thrust 
equation  RSS  solution . 


Figure  l.  Relationship  between  RSS  3-sigma  Thrust,  Factor  of  Safety,  and 

intimate  Load 
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Table  1 : Values  Used  in  the  SRB  Thrust  Equation  (* 1000  lbf) 


Term 

Iw 

2510 

2510 

N/A 

AF( 

17 

0 

N/A 

UPqH 

0 

0 

N/A 



25.1 

25.1 

N/A 

-20 

N/A 

AE»» 

50 

50 

16.67 

AFmfcfmr 

65 

65 

21.67 

rowamc 

25 

25 

8.33 

25 

25 

8.33 

ARi — „ 

80 

80 

26.67 

13 

13 

4.33 

Thrust  Upper  Bound 

2673 

2636 

Cormpoorfinu 
Factor  of  Safetr 

Ftoui  Upper  Bo»mrf 
(as  calciilm^d) 

5309 

1.280 

Fauj  Upper  Bound  (high  & 
low  un  correlated) 

5238 

1397 

Upper  Bound  (all 
terms  fully  correlated) 

Hi 

1.217 

Implicit  in  the  RSS  thrust  equation  is  the  concept  of  an  underlying  (normal)  distribution  of 
thrust  with  a mean  equal  to  the  sum  of  the  non-RSSed  terms,  and  standard  deviation  equal 
to  1/3  of  the  RSSed  variation  terms.  This  is  the  distribution  is  depicted  in  Figure  1 and  in 
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Figure  2 as  the  “Uncorrelated  (RSS)”  distribution.  Despite  the  nomenclature 
“Uncorreiated”,  this  distribution  includes  the  correlation  between  the  high  and  low  motors 
implicite  in  the  example  RSS  calculation  for  the  SRB  thrust  equation.  (See  the  note  under 
Equation  1). 


The  RSS  thrust  equation  can  be  arranged  in  a form  suitable  for  propagating  uncertainty 
distributions  by  grouping  terms.  Solving  the  distribution  form  of  the  thrust  equation  (using 
Monte  Carlo  simulation)  results  in  a distribution  identical  to  the  one  implied  by  the  RSS 
thrust  equation.  The  ability  to  produce  a distribution  with  the  same  mean,  standard 
deviation,  and  3-sigma  (99.87%)  upper  bound  as  the  RSS  method  by  propagating 
uncertainty  distributions  using  Monte  Carlo  simulation  demonstrates  that  method  of 
propagating  uncei+amties  and  the  RSS  method  are  equivalent  in  the  limit  that  the  RSS 
assumptions  of  symmetry  and  independence  are  valid. 

Objective  2. 


If  two  or  more  terms  in  the  RSS  thrust  equation  are  known  (or  believed)  to  be  correlated, 
then  the  RSS  method  does  not  produce  a conservative  result.  While  a rigorously  correct 
derivation  of  the  RSS  thrust  equation  could  be  developed  to  handle  correlated  factors,  the 
propagation  method  handles  correlation  quite  easily,  by  specifying  a correlation  coefficient 
between  two  or  more  factors  for  the  Monte  Carlo  simulation.  Figure  2 shows  the  original 
(uncorrelated)  total  thrust  distribution  and  the  total  thrust  distribution  which  would  result 
if  two  factors  (Fbi«*  and  AF*^,)  were  correlated  (correlation  coefficient  = 0.75).  The 
extreme  case  of  non-independence,  in  which  all  factors  in  the  SRB  thrust  equation  are 
fully  correlated  is  also  shown.  Figure  3 illustrates  these  relationships  in  greater  detail  by 
focusing  on  the  upper  tails  of  the  distributions.  The  axis  on  the  right  hand  side  of  the 
Figures  shows  the  factor  of  safety  associated  with  the  99.87%  upper  certainty  bound  on 
the  distributions. 


Figure  2.  Between  3-Signia  Thrust,  Factor  of  Safety,  and  Load 

for  RSS  (Independent)  & Co rr*i***rf  Inputs 
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The  RSS  solution  to  the  thrust  equation  is  cleariy  not  conservative  if  the  variation  in  the 
terms  of  the  thrust  equation  are  correlated. 
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Figure  3.  Comparison  of  SRB  Thrust  Distributions  for  Correlated  and 
Uncorreiated  Sources  of  Variation. 
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Objective  3. 

Most  (if  not  all)  of  the  analysts  and  reviewers  of  the  SRB  pressure  “blip”  data  noted  that 
there  appear  to  be  different  sub-populations  of  pressure  variations  embedded  in  the  data. 

It  was  almost  universally  noted  that  positive  pressure  blips  were  larger  than  negative 
“dips”,  and  appeared  to  have  a different  physical  root  cause  than  symmetric  random 
variations  about  the  nominal  pressure  profile. 

The  RSS  solution  to  the  SRB  thrust  equation  is  incapable  of  handling  asymmetric 
(skewed)  variations.  Implicit  in  the  idea  of  RSSing  variation  terms  is  the  demand  that 
every  positive  pressure  excursion  is  (on  average)  matched  by  seme  combination  of 
negative  pressure  excursions,  and  vice-versa.  Furthermore,  the  RSS  method  demands  that 
the  probability  of  all  sources  of  variation  occurring  be  equal.  The  observed  pressure  blips 
do  not  appear  to  occur  with  the  same  frequency  as  other  random  variations  in  the  pressure 
profile,  so  it  is  likely  that  the  source  of  the  blips  does  not  have  the  same  probability  as 
other  random  (and  symmetric)  variations. 

Figures  4 through  8 depict  the  results  of  propagating  the  skewed  pressure  blip 
distributions  developed  by  F.  Safie  of  MSFC.  While.  Safie’s  work  provides  some  insight 
into  the  effect  of  segregating  the  population  of  pressure  variations,  he  did  not  show  the 
impact  on  total  SRB  thrust  or  factor  of  safety.  To  measure  that  impact  we  replaced  the 
AFfep  term  in  the  thrust  equation  with  the  distributions  proposed  by  Safie.  The  results  are 
uniformly  higher  factors  of  safety  (smaller  upper  bounds  on  thrust). 
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Figure  5.  Lognormal  Distribution  far  Pressure  Blips 
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Figure  7.  Lognormal  Distribution  far  Pressure  Blips 
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Although  the  lognormal  distribution  is  strictly  greater  than  0,  the  upper  bound  on  thrust 
for  these  distributions  is  smaller  than  the  upper  bound  associated  with  the  normal 
distribution  used  to  fit  the  same  pressure  blip  data.  This  apparent  anomaly  is  due  to  the 
fact  that  the  lognormal  distribution  provides  a closer  fit  to  the  pressure  spike  data  than  the 
normal  distribution.  The  result  on  the  overall  SRB  thrust  distribution  increase  the 
probability  density  in  the  region  between  the  mean  and  die  99.87%  upper  bound,  shifting 
the  mean  higher  but  pniKno  the  99. 87-th  percentile  closer  in.  ie suiting  in  a smaller  upper 
bound  on  thrust  and  consequently,  and  higher  factor  of  safety. 

Objective  4. 


It  is  not  clear  that  the  thrust  equation  captures  ail  sources  of  uncertainty  in  SRB  thrust,  or 
that  the  values  given  to  the  terms  of  the  equation  (which  were  derived  from  2 sample  / 
second  data)  capture  the  same  range  as  the  12.5  sample  / second  data.  In  principle,  the 
thrust  equation  should  capture  the  uncertainty  in  SRB  thrust  from  a variety  of  sources, 
only  one  of  which  is  observed  variability  in  the  SRB  pressure  profile.  An  important 
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“sanity  check”  on  the  thrust  equation  is  to  ensure  that  the  maximum  plausible  thrust 
(99.87-th  percentile)  generated  by  the  equation  is  at  least  as  conservative  as  an  upper 
bound  on  thrust  generated  from  the  pressure  data  alone. 


SAIC  examined  the  12.5  sample  / second  data  and  developed  a segregated  data  set  which 
would  allow  an  alternate  approach  to  the  SRB  thrust  equation  to  determine  the  “maximum 
plausible  thrust”  based  only  on  variability  in  the  data  and  uncertainty  in  converting 
pressure  to  thrust.  In  this  approach,  only  the  pressure  variations  during  the  65  to  76 
second  interval  were  examined,  since  all  physical  mechanisms  for  the  occurrence  of  the 
pressure  “blips”  are  postulated  to  occur  in  that  time  frame.  The  data  was  segregated  to 
examine  motor-to-motor  (inter-motor)  pressure  variations,  symmetric  variations  about  the 
nominal  value  in  a given  motor  (intra-motor),  and  the  skewed  pressure  variations 
associated  with  the  pressure  “blips”.  Figures  8 through  12  depict  these  distributions. 


Figure  8.  “Raw”  Combined  SRB  Thrust  Distribution  Based  on  12.5  Sample  / 

Second  Pressure  Data 


12L5/sec  Pressure  Data  Derived  RSRM  Thrust  In  66*76  sec  Interval 


Thruatflbf  x 1000) 


Note  on  Figure  8 that  a normal  curve  based  on  the  mean  and  standard  deviation  of  the 
data  is  not  a good  fit  The  underlying  data  shown  in  the  histogram  appears  to  have  a 
normally  distributed  component  with  a somewhat  mean  than  fitted  curve,  and  an 

additional  component  for  pressure  spikes  above  the  mean.  SAIC  found  that  an  excellent 
fit  to  the  data  was  given  by  resolving  the  data  into  three  components:  Normally 
distributed  Motor-to-Motor  variations  in  nominal  pressure  (Inter-Motor);  Normally 
distributed  variations  within  a motor  (Intra-Motor);  and  Lognornially  distributed  pressure 
spikes  remaining  when  the  normally  distributed  Iatra-  and  Inter-  Motor  variations  were 


removed. 


9 


Risk  lmplioMrom  of  Space  Shuttlo  5RB  rs^mraar  Piasus  Examiou:  Appendix  5. 

SAJCNY  94-01-10 


Figure  9 ♦ Inter-Motor  (Motor  to  Motor)  Variation  in  SRB  Pressure 

RSRM  Inter-Motor  Nominal  Pressure  Change  in  66-76  sec  Interval 
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Figure  10.  Inter-Motor  Left  / Right  Pressure  Differential 
InteftMotor  Nominal  Pressure  Difference  In  66-76  sec  interval 
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Figure  11.  Maximum  Pressure  Excursions  Adjusted  for  Inter-Motor  Variation 
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Figure  12.  Nominal  Intra-Motor  Thrust  Distribution  Adjusted  for  Inter  Motor 

Variation. 
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Figure  13.  Pressure  Spike  Distribution  After  Adjusting  for  Inter*  and  Nominal 

Intra-  Motor  Variation. 


Histogram  and  Density  Function  of  Pressure  Spite  Related  i2.Sfeec  Data 


Note  that  when  inter-  and  nominal  intra-  motor  variations  are  removed,  the  maximum 
pressure  spike  above  the  nominal  pressure  is  13.5  psi 

To  ensure  that  the  SRB  thrust  equation  upper  bound  captured  at  least  the  variability  in  the 
pressure  data  a series  of  Monte  Carlo  simulations  were  performed.  One  set  of  simulations 
was  based  on  a normal  distribution  using  the  mean  and  standard  deviation  of  the  “raw” 
pressure  data.  Since  a normal  distribution  did  not  appear  to  fit  the  data  particularly  well,  a 
second  set  of  simulations  was  performed  using  the  combined  inter-motor,  nominal  intra- 
motor,  and  spike  distributions  described  above.  The  results  of  these  simulations,  along 
with  the  other  mtmerical  results  of  this  analysis,  are  summarized  in  Table  2. 
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Table  2. Summary  of  Numerical  Results 


Method  of  Calculation 

Thrust 

Upper 

Bound 

CnrresponHinw 

Factor  of  Safety 

Comments 

Thrust  Eqn, 
RSS. 

Example  Data 

5309 

1.280 

Benchmark  - Defines  ultimate 
equivalent  thrust  for  all  FOS 
calculations. 

Implicit  assumption  that  high  & low 
motor  variations  are  fully  correlated. 

Thrust  Eqn. 

Propagated. 

Example  Data 

High  & Low  Correlated. 

5310 

1.280 

Duplication  of  RSS  results  using 
yii/paganon  of  uncertainties. 

Thrust  Eqn, 

RSS, 

E.T wmnle  Data 

High  & Low  NOT  Correlated. 

5238 

1.297 

Built  of  conservatism  in  RSS  example  is 
Lou  tacit  assumption  that  high  & low 
are  correlated. 

Thrust  Eqn, 

Propagated. 

Example  Data 

High  & Low  NOT  Correlated 

5240 

1.297 

Thrust  Eqn, 

PropswnM, 

Example  Data 

Block  & Shape  Correlated  (0.75) 

5325 

1276 

Best  guess  at  actual  correlation  of  thrust 
equation  terms  except 
conservative  assumption  of  high  St  low 
correlated. 

Thrust  Eqn, 

Propagated, 

Example  Data 

AH  Terms  Fully  Correlated 

5585 

1.217 

(Unreasonable)  Wont  Case  Correlation 

Thrust  Eqn. 

RSS  Solution, 

Example  Data 

All  Terms  Fully  Correlated 

5583 

1J217 

(Unreasonable)  Worst  Case  Correlation 
Further  verification  that  propagation 
matches  RSS  for  same  assumptions. 

Method  of  Calculation 


Thrust  Eqn,  Propagated. 
Lognormal  - all  RSRM  blips 
No  Correlation  (exc.  high/low) 


Thrust  Eqn.  Propagated. 
Lognormal  - all  RSRM  blips 
w/ouc  top  4 

No  Correlation  (exc.  high/low) 


Thrust  Eqn.  Propagated. 
Lognormal  ♦ all  WECCO  blips 
w/om  top  4 

No  Correlation  (exc.  high/low) 


Thrust  Eqn,  Propagated. 
Lognormal  - WECCO  blips  + top 
4 as  separate  population 
No  Correlation  (exc.  high/low) 


125  Sample  / Sec  Data, 

Haw”  data-  normal  distribution 
No  Correlation  (exc.  high/low) 


125  Sample  / Sec  Data. 

RSS, 

"Raw"  data*  normal  distribution 
No  Correlation  (exc.  higfa/bw) 


125  Sample  / Sec  Data. 
Propagated.  SAIC  separation  of 
"Raw”  data  • normal  inter-  & 
intra-  motor;  lognormal  spike 
No  Correlation  (exc.  high/low) 


125  Sample  / Sec  Data, 
Propagated,  SAIC  separation  of 
"Raw”  data  - normal  inner-  & 
Ultra-  motor}  lognormal  spike 
Actual  Right/Left  Correlation 
Coefficients 


Thrust 

Upper 

Bonmi 

Corresponding 
Factor  of  Safety 

Comments 

5277 

1.288 

Replace  DFshape  term  in  Thrust 
Equation  with.  lognormal  fit  to  all 
RSRM  blips  (Safie). 

5277 

1588 

Replace  DFshape  term  in  Thrust 
Equation  with  lognormal  fit  to  all 
RSRM  blips  except  top  4 (Safie). 

5286 

1285 

Replace  DF<f»ape  term  in  Thnist 
Equation  with  lognormal  fit  to  ail 
WECCO  blips  (Safie). 

5279 

1287 

t 1 

Replace  DFshape  term  in  Thnist 
Equation  with  lognormal  fit  to  all 
I WECCO  Mips  except  top  4 (Safie). 

Normal  distribution  fit  to  125  sample  / 
sec  in  66  - 76  second  interval. 


Normal  distribution  fit  to  125  sample  / 
sec  dam  in  66  • 76  second  interval 


Separate  125  sample  / sec  data  from  66 
- 76  second  interval  into  normal  inter- 
and  intra-motor  distributions  + 
lognormal  spike  distribution. 


SAIC's  best  estimate  of  actual  Factor 
of  Safety  based  oa  variation  in 
observed  125  sample/sec  data  in  66  - 
76  second  interval  The  thrust 
eqnnriqo  conservatively  bounds  this 
value. 


The  RSS  solution  to  the  SRB  thrust  equation  appears  to  provide  a conservative  upper 
bound  on  thrust  relative  to  every  reasonable  alternative  formulation  examined,  with  the 
important  exception  of  correlation  among  the  terms  of  the  equation.  It  is  recommended 
that  NASA  identify  the  extent  to  which  the  terms  in  the  thrust  equation  are  correlated,  and 
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Ride  Iropbauiooa  oi  3 paca  .Shuttle  SRB  ftmuic  fexcuraorg:  Appendix  5. 

SAJCNY  94-01-10 

incorporate  a means  for  dealing  with  correlation  when  calculating  maximum  plausible 
thmst  and  factors  of  safety. 

The  thrust  equation  produces  conservative  upper  bounds  on  thrust,  and  therefore 
reasonably  conservative  factors  of  safety,  primarily  because  of  the  implicit  assumption  that 
the  thrust  variation  in  the  high  and  low  motors  is  fully  correlated.  Since  the  measured 
correlation  coefficient  between  right  and  left  motors  is  0.63,  the  tacit  assumption  of  100% 
correlation  is  not  excessively  conservative. 
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Appendix  6. 

SRB  Pressure  Plots  Used  in  Independent  Statistical 
Analysis  (excerpt  from  “Solid  Rocket  Booster  Chamber 
Pressure  Perturbation  Review  Committee  Presentation  to 

NASA,”  14  January  1994) 
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Appendix  7. 

Methodology  for  Determining  Minimum  Required 
External  Tank  Structural  Safety  Factor  (excerpt  form 
"External  Tank  Evaluation  of  RSRB  Pressure 
Perturbation/1  6 January  1994. 
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SUMMARY 

This  report  describes  the  first  phase  of  a study  designed  to  improve  the 
management  and  the  safety  of  the  biack  tiles  of  the  Space  Shuttle  orbiter.  This  study 
is  based  on  the  coupling  of  a probabilistic  risk  assessment  (PRA)  model  and  relevant 
organizational  factors.  In  this  first-phase  report,  a first-order  PRA  mode!  is  developed 
and  used  to  design  a risk-based  criticality  scale  combining  the  probabilities  and  the 
consequences  of  tile  failures.  This  scale  can  then  be  used  to  set  priorities  for  the 
maintenance  and  gradual  replacement  of  the  black  tiles. 

A risk-criticaiity  index  is  assessed  for  each  tile  based  on  its  contribution  to  the 
probability  of  loss  of  the  vehicle.  This  index  reflects  the  loads  to  which  each  tile  is 
subjected  (heat,  vibrations,  debris  impacts  etc.)  and  the  dependencies  among 
failures  of  adjacent  tiles.  It  also  includes  the  potential  decrease  of  tiie  capacity 
caused  by  imperfect  processing  (e.g.,  a weak  bond),  and  the  criticality  of  subsystems 
exposed  to  extreme  heat  loads  at  re-entry  in  case  of  tile  failure  and  burn-through. 
Using  this  model  and  some  preliminary  data,  it  is  found  that  the  (mean)  probability  of 
loss  of  an  orbiter  due  to  failure  of  the  black  tiles  is  in  the  order  of  1 0*3  per  flight,  with 
about  15%  of  the  tiles  accounting  for  80%  of  the  risk.  One  of  the  report's  key  findings 
is  that  not  all  the  most  risk-critical  tiles  are  in  the  hottest  areas  of  the  orbiteris  surface; 
some  are  in  zones  of  highest  functional  criticality  (see  Figure  23). 

Management  factors  that  can  affect  tile  safety  are  identified  as:  (i)  time 
pressures  that  increase  the  probability  of  cutting  comers  in  processing;  (2)  liability 
concerns  and  conflicts  among  contractors,  which  affect  the  flow  of  information;  (3)  the 
low  status  of  the  tile  work  and  the  turnover  among  tile  technicians,  which  may 
increase  the  work  load  and  decrease  its  quality;  (4)  the  need  for  more  random  testing 
to  detect  imperfect  bonds  and  to  monitor  the  evolution  of  the  system  over  time;  and 
(5)  the  handling  of  the  external  tank  and  the  solid  rocket  boosters  whose  insulations 
constitute  a major  source  of  the  debris  that  could  hit  the  tiles  at  take-off. 
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Safety  of  the  Thermal  Election  System  of  tie  Space  Shuttle  Orbiter: 
Quant ita1: , e Analysis,  and  OrganlsKticm.il  Factors 

Phase  1 : 

Risk-based  ,,:;mrity  .sate  andjaatlmir  apfjtbsgrvations 

Section  1 : 

INTRODUCTION 


The  National  Aero  -.a.uics  and  Space  Administration  (NASA)  manages  many 
aspects  of  the  Space  Shutn  Orbiter  program  under  ight  resource  constraints:  time, 
money,  human  resource  personnel  and  management's  attention,  etc.  The 
maintenance  of  the  orbit'-  n Thermal  Protection  System  (TPS)  is  an  example  of 
operations  that  must  reel  r with  these  limitations',  The  processing  of  the  tiles 
between  flights  is  labor  intrusive  and  time  consuming  and,  because  it  is  often  on  the 
critical  path  to  the  next  Is.  rich,  the  work  has  to  be  done  under  sometimes  severe 
time  constraints.  Although  yeat  attention  is  dedicated  to  the  tile  work,  its  quality  is 
occasionally  affected  by  th  s -demanding  schedule.  The  importance  of  the  tiles  varies 
according  to  their  location  on  the  orbitefs  surface.  Over  some  areas  of  the  orbiter^ 
surface,  several  tiles  could  lost  without  causing  major  damage  or  risking  the  lives 
of  the  crew;  in  other  areas  l ie  loss  of  a single  tile  could  be  catastrophic.  This  report 
shows  that  the  contributions  of  different  tiles  to  fhe  overall  probability  of  failure 
(defined  here  as  "risk-crifiivality")  vary  widely  acco  rfing  to  their  locations  on  the 
orbiter^  surface.  A large  piresntage  of  the  probability  of  loss  of  vehicle  (LOV)  due  to 
failure  of  the  orbiteris  TPS  can  be  attributed  to  a snvrlf  fraction  of  the  tiles.  Because 
there  will  always  be  reso  T,e  constraints,  setting  o non  ties  is  a first  critical  step 
towards  ensuring  that  the  'cast  risk-critical  tiles  receive  maximum  care  and  quality 
control  so  as  to  minimize  tl1  «•  probability  of  failure. 
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The  level  of  risk-:  r tioality  of  a tile  depends  on  several  factors  and  not 
exclusively  on  the  maxirr  m heat  load  (temperati  re  and  duration)  to  which  it  is 
subjected.  These  factors  r elude:  (1)  the  heat  loads.  (2)  the  location  of  the  tile  with 
respect  to  possible  traject:  lies  of  debris  (e.g.,  piece ; of  insulation  from  the  external 
tank  (ET)  and  the  solid  roc  boosters  (SRBs)),  (3)  Tie  vibrations  and  aerodynamic 
forces,  and  the  critical1'  !;'  of  the  subsystems  located  dTectty  under  the  aluminum 
skin  of  the  orbiter.  Failure  •: ! -s  single  tile  located  dire;  ::t'v  over  one  of  the  most  critical 
systems  (such  as  the  avion,  os,  fuel  cells,  or  hydrauli : lines)  is  likely  to  cause  a LOV 
even  though  these  tiles  a;  ' exoosed  to  the  ma  timum  heat  loads.  By  contrast, 
severe  tile  damage  next  to  the  ; ~ :A  a wing  has  been  survived  in  past  missions. 

Therefore,  the  loads  and  oinsequ  'e  factors  must  be  combined  to  estimate  the 
probability  of  failure  and  :c  determine  the  risk-cnticali  y of  each  tile. 


A tile  fails  because  the  loads  on  it  reaich  values  that  exceed  its  capacity : 
Understanding  both  factor!  toads  and  capacities,  is:  thus  critical  to  the  quantification 
of  the  risk  associated  vri 'i  the  TPS.  The  capacities  vary  considerably  among 
individual  tiles  because  of  liffsrences  in  installation  ::rnditic3ns  and  procedures.  For 
example,  inspections  havu  shown  that  several  tiles  have  been  installed  with  bonding 
on  10%  only  of  the  conte- surface.  In  addition,  the  capacities  of  some  tiles  have 
decreased  overtime  beciinee  of  chemical  reactions  of  the  bond  with  some  otthe 
water  proofing  agents  us»i-d  on  the  orPiter.  Similarly,  the  loads  on  the  tiles  are  not 
uniform.  In  addition  to  exacted  loads  of  heat,  vibre  ions,  and  aerodynamic  forces,  a 
tile  may  also  be  subject?;  c:  to  unexpected  loads  caused  by  debris  impacts.  The 
source  of  most  of  the  deb  s Is  pooriy-installed  and  maintained  insulation  on  the  ET 
and  the  SRBs.  Therefor; , both  loads  and  capacifns  can  be  greatly  affected  by  a 
variety  of  possible  human  errors. 

Some  of  these  errors  can  be  traced  back  to  weak  organizational 
communications,  misguidri:'  incentives,  and  resource  constraints,  which  in  turn,  can 
be  linked  to  the  rules,  the  matures,  and  the  culture  of  the  organization  (Pate-Comeil 
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and  Bea,  1989;  Pate-Cornell,  1990).  Efficiency  of  the  risk  management  process  for 
the  TPS  requires  an  integrated  approach  (National  Research  Council,  1988.) 
Considering  only  organizational  solutions  or  only  technical  solutions  to  minimize  the 
risk  of  failure  would  be  counterproductive  and  wasteful.  Furthermore,  each  individual 
system  cannot  be  evaluated  and  managed  independently.  The  performance  of  the 
ET  and  SRBs  affects  the  reliability  of  the  tiles  which,  in  turn,  affects  the  performance 
of  the  subsystems  that  they  protect  from  heat  loads.  Therefore,  when  setting 
priorities,  the  management  teams  for  the  ET  and  SRBs  must  account  for  the  potential 
detrimental  side  effects  of  their  procedures  on  the  orbiteris  TPS.  By  tracing  back, 
even  roughly,  the  location  of  the  insulation  on  the  ET  and  SRBs  that  could  hit  the 
most  risk-critical  spots  on  the  orbiteris  surface,  it  may  be  possible  to  identify  the  spots 
that  should  be  given  top  pnority. 

1.1  Oblectfves  of  the  overall  Droiect 

The  objective  of  this  study  is  to  provide  recommendations  to  improve  the  tiles 
management  at  Kennedy  Space  Center  (KSC),  Florida,  based  on  the  development 
and  extension  of  a Probabilistic  Risk  Analysis  model  (PRA)  for  the  TPS  of  the  Space 
Shuttle  Orbiter  with  emphasis  on  the  black  tiles.  The  approach  is  to  include  in  the 
analysis  not  oniy  technical  aspects  that  are  captured  by  classical  PRA  (for  example, 
resistance  of  the  tiles  to  debris  impact),  but  also  the  process  of  tile  maintenance  (for 
instance,  when  and  how  are  the  tiles  tested)  and  the  organizational  procedures  and 
rules  that  determine  this  process  (see  Appendix  1:  Pal6-Comell,  1989.)  The  question 
is  whether  these  organizational  factors  affect  the  reliability  of  the  tiles,  and  if  they  do, 
to  what  extent.  Unking  the  PRA  inputs  to  some  aspects  of  the  process  and  the 
organization  allows  addressing  the  often-raised  question  that  PRA,  although  it 
captures  human  errors,  is  of  little  help  when  considering  more  fundamental 
managerial  and  organizational  problems.  This  model  is  designed  to  allow 
management  to  set  priorities  in  the  allocation  of  limited  resources  in  a continuous 
effort  to  improve  the  reliability  of  the  Space  Shuttle.  The  method  thus  allows  for  a 
global  approach  to  risk  management,  involving  technical  as  well  as  organizational 
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improvements,  while  accc  .lining  for  the  uncertainty ■»  about  the  system's  properties 
and  human  performance,  n cases  where  the  problem  is  sufficiently  well  defined, 
one  can  then  assess  (ever  if  only  coarsely)  the  corn:  spending  increase  of  reliability. 

Uncertainties  abou: ' he  performance  of  a complex  system  such  as  the  TPS  of 
the  Space  Shuttle  can  t ?.  first  described  by  its  j robs.bility  of  failure  (first-level 
uncertainties).  When  con  - outing  this  probability,  ore  laces  uncertainties  about  the 
probabilities  of  the  basic  «•  ur  ts  including  technical  - lilures  of  individual  components 
and  human  errors.  Thesi-  uncertainties  can  be  described  by  placing  probability 
distributions  on  the  input!:!  then  computing  thB  resulting  uncertainty  of  the  overall 
failure  probability  (second -level  uncertainties).  The  role  and  importance  of  these 
second-level  uncertainty;  depend  on  the  intended  use  of  the  study.  PRA  can 
generally  support  two  type;  of  decisions:  (1)  whethei  or  not  a system  is  safe  enough 
for  operation  on  the  basis  : : a chosen  safety  thresh old  or  other  acceptance  criteria, 
and  (2)  (the  main  objective  of  this  study)  how  to  aHooate  scarce  resources  among 
different  subsystems  on  the  basis  of  risk-based  priorities  in  order  to  achieve 
maximum  overall  safety.  "~m  depth  of  the  supporting  risk,  analysis  must  be  adapted 
to  the  decision  to  be  made 

in  the  first  type  of  i incision,  where  one  is  trying  to  decide  if  a system  is  safe 
enough,  it  is  important  to  da -scribe  the  result  of  the  risk  assessment  not  only  by  a 
point  estimate  of  the  failure  probability  but  by  a full  distribution  of  this  probability 
reflecting  all  the  uncertain!  hr  s of  the  input  values.  Second-order  uncertainties,  which 
are  particularly  critical  for : -insisted  operations,  become  important  because  they  give 
the  decision  makers  an  indication  of  the  accuracy  of  the  analysis.  A different  launch 
alternative  may  be  preferre  d if,  for  example,  the  mean  probability  of  mission  failure  is 
less  than  one  in  a thousand  but  can  take  values  as  high  as  one  in  fifty.  Note 
however  that  the  overal  failure  probability  per  deration  is  the  mean  of  that 
distribution. 
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In  the  second  type  of  decision,  where  the  objective  is  an  optimal  allocation  of 
resources,  the  priority  ranking  has  to  be  based  on  a single  point  estimate  for  the 
probability  of  failure.  For  optimality  reasons,  the  mean  of  the  distribution  of  the  failure 
probability  is  the  relevant  characteristic.  In  this  case,  critical  factors  are,  first,  the 
relative  values  of  the  probabilities  of  mission  failure  associated  with  failure  of  each 
component,  and  second,  the  variations  of  these  relative  probabilities  with  additional 
units  of  resources  (e.g.,  time).  The  combination  of  these  two  factors  then  allows 
giving  priority  to  the  components  for  which  more  resources  will  bring  the  greatest 
increase  of  safety. 

In  this  study,  we  construct  first  a priority  scale  for  the  black  tiles  based  on  our 
current  estimates  of  the  means  of  the  partial  failure  probabilities,  i.e,  the  mean 
probability  of  LOV  associated  with  the  potential  failure  of  each  tile  (first-order  PRA). 
An  analysis  of  the  second-order  uncertainties  may  change  the  priorities  if  they 
change  the  means  of  these  partial  failure  probabilities.  Across  subsystems  (e.g.,  tiles 
versus  main  engines),  the  uncertainty  of  the  failure  probabilities  may  vary  widely 
because  the  failure  modes  involve  a spectrum  of  basic  events  whose  probabilities 
are  known  with  different  degrees  of  uncertainty.  In  this  case,  full  analysis  of 
uncertainties  may  well  change  the  means  themselves  and  the  optimal  resource 
allocation.  Within  a given  subsystem,  such  as  the  tiles,  the  inputs  of  the  analysis  for 
the  different  elements  (e.g.,  the  initiating  events)  are  generally  of  similar  nature  and 
the  .variations  of  uncertainties  may  be  less  important.  Yet,  uncertainties  about 
extreme  values  of  the  heat  loads  clearly  vary  according  to  the  location  of  a tile  on  the 
orbiter's  surface.  Furthermore,  the  probabilities  of  failure  (and  associated 
uncertainties)  of  the  subsystems  located  directly  under  the  skin  given  a loss  of  tile(s) 
and  bum-through  vary  widely.  Further  study  should  therefore  investigate  the  effect  of 
second-order  uncertainties  to  determine  their  impact  on  the  resource  allocation. 

Our  work  on  this  problem  is  divided  into  two  separate  phases.  The  first 
phase,  which  is  presented  in  this  report,  Involves  the  development  and  illustration  of 
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a first-order  PRA  mode!  :r  the  black  tiles  of  the  TPS  based  on  a probabilistic 
analysis  of  different  failure  ‘scenarios.  In  this  analysis,  we  use  mean  probabilities  to 
construct  a risk-criticality  errnate  for  each  tile  and  to  establish  a scale  of  priorities  for 
management  purposes.  K -v?  features  of  this  model  f re  the  dependencies  of  failures 
among  adjacent  tiles,  and  I ; Tween  failures  of  tiles;  in  specific  TPS  zones  and  failures 
of  the  subsystems  locates:  T these  zones  under  tha  orbiter*s  aluminum  skin.  The 
analysis  thus  reiies  on  a mrtitioning  of  the  orbiters  surface  (1)  among  zones  of 
temperature,  debris,  am:  aerodynamic  loads,  and  (2)  among  critical  system 
locations.  For  each  tile,  ive  compute  a risk-critic.  Wty  factor  that  represents  its 
contribution  to  the  overall  r :•;  < of  orbiter  failure  due  tc  TPS  failure  accounting  both  for 
loads  ( load-criticality)  and  Viiure  consequences  at  ths  location  of  the  tile  ( functional 
criticality.) 

The  second  phase  * the  work  will  involve  refinement  and  implementation  of 
the  model,  including  (1  art  analysis  of  (second-order)  uncertainies  about 
probabilities  in  order  to  cisTermine  if  these  uncertainties  can  affect  management 
priorities,  and  (2)  organizational  extensions.  The  or-;jenizational  extensions  involve 
identification  and  evaluatin'  of  the  mechanisms  by  which  potential  problems  occur, 
are  detected,  and  can  be  -m:  irected.  This  second  phase  will  thus  involve  a study  of 
the  maintenance  process  accounting  for  its  abiii-y  to  detect  and  correct  past 
mistakes  (weak  tiles),  ensi  i satisfactory  quality  contnoi  of  the  current  work,  and  track 
the  possibility  of  weakening  y:  the  TPS  over  time.  Th  a objective  of  Phase  2 will  be  to 
identify,  with  the  help  of  u ;;perts,  the  organization?'  roots  of  technical  and  human 
problems  and  to  make  rt; ::: ornmendations  for  possible  improvements.  The  PRA 
model  will  be  used  to  assigns  the  relevance  of  thesr  factors  to  the  reliability  of  the 
black  tiles  and  the  effective  ness  of  proposed  solution;. 

In  this  study,  the  F:RA  model  is  not  an  end  n itself,  but  a tool  designed  to 
assess  specific  management  practices.  The  level  of  detail  of  the  analysis  is  set  with 
this  goal  In  mind.  One  ke>  uniting  factor  in  this  effect  is  the  unavailability  of  precise 
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values  for  the  probabilities  of  failure  of  the  subsystems  located  under  the  orbiter*s 
skin  conditional  on  burn-through.  Such  data  would  be  the  natural  results  of  a 
complete  top-down  PRA  for  the  whole  orbiter.  Because  NASA  has  chosen  to  do  the 
analysis  piecemeal  and  only  for  selected  subsystems,  these  results  have  not  been 
generated.  Therefore,  we  use  expert  opinions  instead  of  analytical  results  to  assess 
globally  these  conditional  failure  probabilities. 

1.2  Scooe  of  the  work  In  Phase  1: 

As  stated  in  the  proposal,  the  objectives  of  this  first  phase  are:  (1)  to 
understand  the  basic  properties  of  the  tiles,  (2)  to  identify  the  main  experts  and 
establish  working  relationships  with  them,  (3)  to  identity  the  main  data  bases  and 
sources,  (4)  to  design  the  Probabilistic  Risk  Assessment  (PRA)  modei,  and  (5)  to 
identify  some  of  the  relevant  organizational  features  that  affect  the  reliability  of  the 
Thermal  Protection  System  (TPS)  with  emphasis  on  the  black  tiles  and  on  the 
maintenance  process.  This  first  phase  of  the  project  was  funded  in  part  under 
SIORA  (Stanford  Space  Systems  Integration  and  Operations  Research 
Applications),  and  in  part  as  a separate  research  project  (both  under  cooperative 
agreement  NCC10-0001).  Under  the  SIORA  funding,  we  identified  some 
fundamental  issues  involved  in  the  linkage  between  the  reliability  of  the  black  tiles 
and  various  features  of  the  organizations  that  participate  directly  or  indirectly  in  their 
maintenance  (including,  but  not  exclusively,  NASA  at  the  different  space  centers, 
Lockheed  Corporation,  and  Rockwell  International).  The  problem  formulation  was 
presented  in  a paper  delivered  at  a major  Probabilistic  Safety  Analysis  conference 
(PSA*89)  held  in  Pittsburgh,  in  1989,  in  a session  chaired  by  Mr.  B.  Buchbinder 
(NASA  Headquarter,  SRM&QA)  on  probabilistic  safety  assessment  for  space 
systems.  This  paper  won  the  Best  Paper  Award  of  the  American  Nuclear  Society  for 
PSA'89.  It  is  included  in  this  report  as  Appendix  1. 

This  Phase  i report  is  organized  as  follows: 

1 • Background  information:  functioning,  maintenance,  and  failure  history  of  the 
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tiles. 

2.  D$scriotion  and  illu -ration  of  the  PRA  model:  inputs,  preliminary  results 
(means);  sources  c expertise  and  data. 

3.  Preliminary  observ; ;;:;cns  and  { auaiitativ_el_c_o !iminci..pt.oroanizational  factors 
and  the  reliability  m: -del. 

1.3  Gathering  of  information  tJ&L.al  contact 

The  data  and  the  {levant  information  ustr.1  in  this  study  were  gathered 
through  meetings  and  informal  interviews  of  tile  specialists,  tile  personnel 
(technicians  and  inspector*',  and  management  at  Kennedy  Space  Center  (NASA 
and  Lockheed  Corporatici  1 \.  Johnson  Space  Career  (NASA),  and  in  Southern 
California  (Rockwell  International  in  Downey).  We  conducted,  in  particular, 
extensive  (although  inform?.!)  interviews  of  tile  technicians  including  both  old-timers 
and  newcomers.  Several  n:  ‘ham  came  from  Rock  'ell  and  had  participated  in  the 
initial  tile  installation  won  They  described  to  us  procedures  and  problems  and 
offered  suggestions. 

The  probability  estimates  were  obtained  in  two  ways:  frequencies  of  events 
from  official  or  personal  i s cords  (e.g.,  debris  hits;  frequency  of  tiie  damage),  and 
subjective  assessments  (e.g.,  probability  of  failure  of  the  subsystems  under  the 
orbiter  skin  if  subjected  ter  : recessive  heat  loads  dun  to  a hole  in  the  orbitens  skin). 
Note  that: 

1.  The  data  used  Twe  for  the  illustration  of  the  first-order  PRA  model  are 
realistic  but  coarse  primates  that  can  be  refined  in  the  implementation  part  of 
the  second  phase. 

2.  Second-order  uncertainties  about  the  probability  estimates  themselves 
have  not  been  enc  tided  at  this  stage.  The  probability  figures  that  are  used 
here  represent  imp ddtiy  the  means  of  possible  probability  distributions  of  the 
probabilities  of  evsnts.  Assessment  of  the' 3 second-order  probabilities  or 
probability  distributi : c ? for  future  frequencies  of  events  (Garrick,  1988)  will  be 
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part  of  the  implementation  phase  if  it  is  judged  necessary  for  the  relevance  of 
the  resuits  to  management  decisions. 

For  this  study,  the  key  technical  points  of  contact  were  the  following: 

At  KSC: 

° David  Weber  (Lockheed) 

® Frank  Jones,  Susan  Black,  Carol  Demes,  and  Joy  Huff  (NASA) 

At  JSC  (NASA): 

0 James  A.  Smith 
* Robert  Maraia 
0 Carlos  Ortiz 
° Raymond  Gomez 

In  Southern  California  (Rockwell,  Downey): 

° B.  J.  Scheii 
° Frank  Daniels 
° Jack  McClymonds 
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Section  2: 

EiiiftKGROUND  INFORMS  10N 


2,3— System  description 

The  designers  of  th  ::i  thermal  protection  system  (TPS)  for  the  space  shuttle 
had  to  solve  a series  of  con:;>t=nc  problems  due  to  thr  wide  range  of  environments  in 
which  the  orbiter  has  to  ops  ate.  A single-componer?  design  could  not  meet  all  the 
necessary  requirements  of  i.  iixistanding  extreme  temperatures  and  vibrations  while 
remaining  light  weight  and  ■ tixtble  and  tasting  for  100  riissions.  Instead,  a complete, 
integrated  system  was  devr-t-ped  relying  on  different  components  to  solve  different 
problems  (Cooperand  Holisviey,  1931.) 

In  the  highest-tempi  ature  areas,  reinforced  tarpon  carbon  (RCC)  is  used. 
This  material  is  extremely  hr*at  resistant  and  able  tr  withstand  temperatures  up  to 
2800°F  on  a reusable  basi  ;;  and  up  to  3300°F  for  a single  flight.  The  use  of  this 
material  is  limited  to  the  festflng  edges  of  the  wing  and  the  nose  cone.  In  areas  of 
the  orbiter  where  heating  ; i:as  are  lower,  a flexible  reusable  surface  insulation 
(FRSI)  is  used.  This  matermf  is  made  of  a silicon  e'astomeric  coated  Nomex  felt, 
which  is  heat-treated  to  ailc  *•  using  it  for  100  missions  at  temperatures  up  to  700°F. 
in  areas  where  surface  tern  mratures  are  above  700tT'  but  below  1500°F,  advanced 
flexible  reusable  insulation  •FRSI)  is  used.  AFRSl  is  a "blanket"  composition  with 
one-inch  stitch  spacing,  it  ti  insists  of  an  outer  layer  of  27  mil  silica  "quartz"  glass 
fabric  and  of  an  inner  layer : M glass  fabric  ("E"  glass)  which  encompass  a silica-glass 
felt  material  (microquartz,  c :::  r moniy  called  Q-felf).  T hese  materials  have  replaced 
most  of  the  5,000  thin  wh  T:  tiles  on  the  upper  surface  of  the  orbiters,  originally 
designated  low  temperature  e usable  surface  insulation  (L  RSI).  Their  replacement 
has  reduced  the  complexity  the  TPS  at  the  cost  of  a slight  weight  increase  (see 
Figures  1 and  2.) 
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ACCESS  PANEL 


SIDE  HATCH  — 1 


MIOFUSELAGE 
MAINTENANCE 
ACCESS  DOOR 


NAME  LOCATIONS 
(T)  - DISCOVERY  AND  ATLANTIS 
(T)  - COLUMBIA 


Figure  1 : The  space  shuttle  orbiter 

Source:  Shuttle  Operational  Data  Book.  JSC  08934,  Vol.  4 
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Source:  Shuttle  Operational  Data  Book,  JSC  08934  Vol  4 
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The  tiles  that  are  of  primary  interest  in  this  report  are  designated  high 
temperature  reusable  surface  insulation  (HRSI)  (see  Figure  3.)  These  tiles  are 
coated  with  black  reaction  cured  glass  (RCG)  and  are  certified  for  100  missions  up  to 
a maximum  surface  temperature  of  2300°F.  Approximately  20,000  of  these  tiles  are 
used  to  cover  the  bottom  of  the  orbiter.  Among  them,  approximately  17,000  have  a 
density  of  9 pounds  per  cubic  foot  (pcf).  The  remaining  3,000  tiles  are  of  higher 
density  (12  and  22  pcf).  They  are  used  in  areas  where  higher  strength  is  needed, 
primarily  around  doors  and  hatches,  and  where  it  is  required  by  structural 
deflections.  The  22  pcf  tiles  are  capable  of  withstanding  surface  temperatures  as 
high  as  2700°F  without  shrinkage. 


These  tiles,  being  highly  brittle,  have  a strain-to-failure  performance  that  is 
considerably  less  than  the  aluminum  skin  of  the  orbiter.  In  addition,  the  tiles  have  a 
much  lower  coefficient  of  thermal  expansion.  Therefore,  if  they  were  bonded  directly 
to  the  aluminum,  thermai  and  mechanical  expansion  and  contraction  would  cause 
the  ceramic  material  to  crack  and  fail.  To  protect  the  ceramic  material,  the  sizes  of 
the  individual  tiles  were  kept  small  (nominally  6 inches  square).  These  numerous 
designed  gaps  allow  for  relative  motion  of  the  tiles  as  the  aluminum  skin  expands 
and  contracts  and  the  substructure  deforms  under  loading.  However,  this  allowance 
is  not  sufficient  to  protect  the  integrity  of  the  tiles.  In  order  to  further  isolate  the  tiles 
from  local  forces,  a strain  isolation  pad  (SIP)  is  secured  between  the  tiles  and  the 
skin.  The  SIP  is  a felt  pad  constructed  of  Nomex  fibers  and  comes  in  three  different 
thicknesses  (0.09,  0.115,  and  0.1 6 inch). 

The  tiles  are  bonded  to  the  SIP  and  the  SIP  to  the  aluminum  skin  using  a 
room  temperature  vulcanizing  silicon  rubber  adhesive  (RTV-560).  In  certain  areas 
where  the  aluminum  skin  is  particularly  rough  and  disjointed,  a screed  or  putty 
(RTV-577)  is  used  to  smooth  the  surface.  In  order  for  the  SIP  and  tiles  to  vent  during 
ascent  and  to  protect  the  aiuminum  structure  from  gap  heating,  filler  bar  strips 
(RTV-560  coated  heat-treated  Nomex  felt  material)  secured  only  to  the  aluminum 
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sk*n  are  Placed  around  each  piece  of  SIP.  The  porous  tiieS  ire  allowed  to  vent  since 
the  RCG  coating  does  not  extend  to  the  filler  bar.  Between  tiles  in  the  hotter  areas 
{approximately  4,500  locations),  gap  fillers  are  used  in  addition  to  the  filler  bars  to 
prevent  gap  heating  damage  during  reentry.  The  gap  fillers  are  secured  in  place 
with  RTV.  Figure  4 shows  a typical  black  tile  with  all  the  related  components. 

2.2  Life  cycle  and  maintenance  ooeratlonn 

2.2.1  Tile  manufactured  and  installation 

Because  of  the  extreme  environment  in  which  the  orbiter  operates,  the  TPS 
must  be  made  of  only  the  purest  materials.  Contamination  of  the  tiles  during 
fabrication  could  lead  to  failure  of  the  TPS  well  before  meeting  its  100  mission 
requirement.  Raw  material  (amorphous  silica  fiber)  has  to  be  99.7%  pure  (AW  & ST, 
1976). 


COATING  RCG 


Figure  4:  The  tile  system 

The  fabrication  process  starts  with  a slurry  of  water  and  1 .5  micron  diameter 
silica.  The  water  is  drained  and  binder  added.  This  mixture  is  compressed  into 
blocks  slightly  smaller  than  1 cubic  foot.  After  the  binder  sets  up  in  3 hours,  the 
blocks  are  dried  in  a microwave  oven.  The  sintering  process  which  locks  the  fibers 
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together  requires  tight  heff  tolerances.  The  blocks  are  baked  at  2,375°F  for  two 
hours.  Next,  they  are  cut  ir  r rough  tiles  (four  to  eigIT  per  block).  Tile  density  and 
density  gradient  are  verifier  , sing  X-rays.  Since  each  tile  is  different,  the  tiles  are 
trimmed  to  specification  usrc;  automated  milling  machines.  A second  quality  check 
assures  that  the  tiles  are  fit  tt  coating.  The  coating  in  sprayed  on  and  then  glazed. 
A third  quality  check  verif  c:3  the  integrity  of  the  casting.  These  tiles  are  then 
internally  waterproofed  with  : silane  material.  During  original  construction,  the  tiles 
were  next  placed  in  arrays;  1 lat  matched  their  placement  on  the  orbiteris  surface. 
Each  array  consisted  of  approximately  35  tiles.  The  bottoms  of  the  arrays  were  then 
shaved  to  match  the  shaps:  of  the  orbiter.  A fou  tt*  quality  check  verified  the 
dimensions  ot  randomly  selected  tiles  from  each  arra;  All  current  replacement  tiles 
are  machined  individually. 

The  original  installali : - of  the  tiles  at  time  of  construction  was  done  an  array 
at  a time.  The  SIP  was  first  t ended  to  the  tiies  using  HTV,  while  a lattice  of  filler  bars 
were  bonded  to  the  orbiter.  After  these  bonds  had  set.,  the  entire  array  was  bonded 
to  the  orbiter.  Difficulty  aroc;  ss  in  aligning  the  tiles/SIP  array  'vith  the  grid  of  filler  bars. 
If  the  tile/SIP  array  is  pari  ally  resting  on  the  filler  bars  instead  of  directly  to  the 
arbiter's  skin,  the  strength  v*  the  TPS  bond  is  greatly  'educed.  The  arrays  are  held 
in  place  with  2-3  psi  pressi.  m while  the  RTV  dries.  Sends  are  verified  using  a pull 
test  on  each  tile.  The  strength  of  each  test  varies  based  on  the  location  of  the  tile  and 
the  expected  in-flight  loadin;;  '2  to  13  psi).  Once  a tile  has  passed  this  initial  pull  test, 
it  is  uniikeiy  that  it  will  be  clocked  again  during  its  life  cycle  of  100  flights  unless  an 
anomaly  is  detected. 

2.2.2  Flight  profile  tending 

During  a typical  mission,  the  tiles  are  subject  to  a wide  range  of  loads  and 
temperatures.  These  mus!  I:  e considered  in  order  to  determine  the  limitations  and 
life  cycle  of  the  TPS.  The  description  below  summarizes  a report  by  Cooper  and 
Holloway  (1981). 
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Ignition  of  the  orbited  main  engines  creates  an  oscillatory  pressure  wave 
that  loads  the  tiles  in  the  aft  region  of  the  orbiter.  Though  strong,  this  wave  should 
dampen  rapidly.  In  addition,  acoustic  pressure  created  by  the  engines  can  directly 
load  the  tiles  and  the  aluminum  skin.  Any  motion  of  the  aluminum  will,  in  turn,  cause 
inertial  pressure  on  the  TPS.  The  amount  of  inertial  pressure  depends  on  the  focal 
response  of  the  aluminum  substructure,  but  noise  levels  up  to  165  dB  are  attained 
during  lift  off.  During  ascent,  the  tiles  experience  a wide  range  of  aerodynamic  loads 
including:  pressure  gradients  and  shocks,  buffet  and  gust  loads,  acoustic  pressure 
loads  caused  by  boundary  layer  noise,  inertial  pressure  caused  by  substructure 
motion  and  deflection,  and  unsteady  loads  coming  from  vortex  shedding  from  the 
connecting  structure  to  the  externa!  tank.  Almost  every  tile  will  experience  loads  of 
1 60  dB  during  this  phase  of  a mission. 

Since  the  tiles  are  highly  porous  (90%  void),  it  is  during  the  ascent  that  any 
internal  pressures  must  be  vented  in  order  to  equalize  with  the  external  environment. 
Because  of  this,  both  the  SIP  and  the  tiles  may  experience  varying  degrees  of 
internal  pressure.  Vent  lag  can  cause  tensile  forces  to  build  up.  In  addition,  small 
residual  tile  stresses  are  caused  by  differences  in  the  thermal  expansion  rates  of  the 
tiles  and  the  coating.  Also,  any  water  that  was  absorbed  will  cause  internal  pressure 
as  it  expands  and  contracts  with  the  temperature  changes. 

During  re-entry,  a second  series  of  stresses  are  placed  on  the  TPS  including: 
substructure  deformation,  boundary  layer  acoustic  noise,  steady  aerodynamic  loads, 
unsteady  aerodynamic  loads  caused  by  boundary  layer  separation  and  vortices,  and 
loads  from  aerodynamic  maneuvering.  The  boundary  layer  transition  from  laminar 
to  turbulent  flow  always  occurs,  but  the  time  of  this  transition  (for  the  same  entry 
trajectory)  depends  primarily  on  vehicle  roughness . This  roughness  is  divided  into 
two  types:  discrete  (one  single  large  protuberance)  or  distributed  (many  small 
protuberances.)  Early  time  of  transition  results  in  higher  turbulent  flow  peak 
temperatures  and  higher  total  heat  toads  that  depend  on  temperature  and  time  of 
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exposure  (Smith,  1989).  Nes.riy  one  third  of  the  tiler,  on  the  lower  surface  of  the 
orbiter  reach  temperatures  h excess  of  1900®F  and  r.ra  subjected  to  problems  of 
uneven  thermal  expansion. 

The  TPS  has  been  rigorously  tested  and  has  withstood  thousands  of  test 
cycles  of  limit  load  without  fail ; ■<•:.  The  system  has  then  been  certified  for  at  least  100 
flights.  However,  repeated  erasure  to  the  stresses  end  strains  that  accompany  a 
space  mission  can  affect  this  ntegrity  of  the  individur  components.  The  tiles  can 
weaken,  for  example,  above  r =•  densification  boundary  layer,  the  SIP  can  stretch  as 
fibers  pull  out  of  the  matrix,  n nri  the  R7V  can  creep  under  very  high  loads.  It  is  only 
through  rigorous  maintenan-tii!  procedures  and  quality  control  verifications  that  the 
true  life  cycie  of  the  TPS  can  be  determined  and  that  acceptable  system  safety  can 
be  achieved. 

2.2.3  Tile  maintenance . procedure 

The  maintenance  procedure  is  guided  by  'he  Rockwell  specifications 
(Rockwell  International,  I9f!:'  1989).  If  involves  (1 ) a sequence  of  tile-damage 
inspections  and  assessment  alter  landing  to  decide  which  ones  can  be  mended 
and  which  ones  must  be  re:  l ined;  (2)  tile  replacement;  (3)  bond  verification  using 
pull  tests;  (4)  step  and  gap  measurement;  (5)  decision  V)  install  or  not  a gap  filler. 

The  steps  involved  iri  "lie  replacement  of  a tile  re-  the  following: 

° First  prefit 
° Densification 
° Second  prefit 

° Bonding  of  the  SIP  V:j  the  tile 
° Cleaning  of  the  caV'y  (inspection  point) 

° Priming  of  the  cavity 
° Mixing  (and  testing  i f the  RTV 
° Application  of  the  R"  Y to  the  tile/SIP  system 
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0 Bonding  of  the  tiie/SIP  to  the  cavity 
° Verification  of  the  bond. 


The  verification  of  the  bond  at  the  end  of  this  process  involves  a pull  test  of 
variable  strength.  One  problem  that  has  been  reported  is  that  this  pull  test  may  not 
allow  detection  of  tiles  that  are  only  partially  bonded  because  bonding  to  the 
adjacent  gap  fillers  may  provide  sufficient  strength  to  pass  the  test.  Though  these 
partial  bonds  pass  the  initial  pull  test,  they  tend  to  be  more  susceptible  to 
deterioration  over  time  and  slumping. 

Step  and  gap  measurement  is  meant  to  ensure  the  smoothness  of  the 
charter's  surface  and  avoid  the  excessive  heat  loads  due  to  vehicle  roughness.  It  is 
currently  a time-consuming  procedure  involving  24  measurements  per  tiie,  done 
manually  by  insertion  of  plastic  gauges  to  a certain  depth  in  the  space  between  tiles. 
The  result  of  this  inspection  often  leads  to  a decision  to  instail  standard  gap  fillers. 
Several  problems  have  been  reported  in  this  part  of  the  work,  including  inaccurate 
measurements  due  to  misplacement  of  the  plastic  gauges.  A laser  system  is  currently 
being  developed  to  automate  step  and  gap  measurement,  making  it  both  quicker 
and  more  reliable  (Lockheed  Research  and  Development  Division,  1989;  SIORA, 
1990).  Clearly,  the  corresponding  reliability  gain  for  the  whole  TPS  depends  on  the 
initial  contribution  of  wrong  steps  and  gaps  and  orbiteris  roughness  to  the  probability 
of  failure  of  the  TPS. 

Note  that  this  maintenance  procedure  is  mostly  maintenance  on  demand. 
The  only  random  testing  that  occurs  is  in  select  areas  where  a small  number  of  tiles 
are  pulled  to  determine  if  there  has  been  any  weakening  of  the  original  screed 
caused  by  initial  and  subsequent  exposures  to  waterproofing  materials.  In  the 
absence  of  a non-irrtrusive  test  of  the  bond,  the  fear  is  that  the  tests  themseives  may 
weaken  the  tiie/SIP/RTV  system. 
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2.3  Failure  history:  inch:  i! riL 

9-3-1  Failure  history  jj  i . :\  inj^tTl^^ina 

A history  of  the  tilts  problems  can  best  be  described  by  grouping  the 
difficulties  into  three  broad  c ategories:  (1)  design  p oblerns,  (2)  processing  and 
maintenance  induced  prob^vs,  and  (3)  damage  caused  by  external  debris.  This 
information  is  summarized  ! ' : • r data,  compiled  by  Carlos  Ortiz  at  Johnson  Space 
Center  (JSC)  in  Houston,  Tmtas.  It  should  be  remembered  that  to  date,  only  two 
black  tiles  have  been  lost  p.  for  to  or  during  re-entry,  one  due  to  RTV  failure  caused 
by  chemical  reaction  with  a •■ateiproofing  agent  (Challenger,  Flight  41-G)  and  one 
due  to  debris  impact  (Atla. : is.  Flight  STS-27R).  Even  then,  there  was  some 
remaining  material  in  the  tii<;  cavity  prior  to  entry.  In  both  cases,  there  was  neither 
catastrophic  secondary  tiie  carnage,  nor  burn-through  of  ths  orbiter  skin.  This  good 
fortune  was  due  in  part  to  th<  location  of  the  missing  tires  and  the  structure  under  the 
skin.  Similar  losses  in  d f erent  locations  could  have  been  far  more  costly. 
Nonetheless,  the  TPS  has  lone  very  well  and  proven  to  be  far  more  robust  than 

anticipated. 

With  any  complex  s>  irtem,  the  design  procesr  does  not  stop  with  the  initial 
product.  Improvements  occ: . as  the  system  is  used  ond  v/caknesses  are  detected. 
The  orbitehs  TPS  is  no  ditto  runt.  Revisions  to  the  original  design  started  before  the 
first  launch,  and  have  continued  ever  since.  These  properly  redesigned  components 
have  greatly  increased  thu  reliability  and  maintainability  of  the  overall  system. 
Deficiencies  that  have,  as  of  yet,  gone  undetected  wii  be  solved  in  a similar  fashion 
providing  that  they  are  uncovered  prior  to  a major  system  failure. 

Design 

During  the  initial  design  of  the  TFS,  each  component  (tile.  SIP , and  RTV)  was 
certified  individually:  but  it  ,-ms  not  until  they  were  combined  during  the  construction 
of  the  first  orbiter,  Columbii:  that  a "weak  link"  in  the  bend  between  the  tile  and  SIP 
was  indentified.  Tests  of  '■"&  tiie/RTV/SIP/Koropor  as  a system  revealed  that  the 
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^ combined  tensile  strength  was  weakest  at  the  tile-to-SIP  interface.  This  was  caused 
by  the  RTV  not  impregnating  enough  the  basic  tile  material  to  insure  adequate 
attachment.  The  President  of  Rockwell  Space  Systems  Group  stated: " I think  that  it 
is  a fair  criticism  that  we  didn't  define  the  problems  more  clearly  as  far  as  the 
tile/strain  isolation  pad  capabilities  are  concerned.  We  worked  too  hard  on  the 
quality  of  the  material  alone  and  waited  too  long  for  the  thermal  analysis."  (AW&ST, 
25  February  1980.)  Because  of  this  oversight,  many  ol  the  already  installed  tiles  had 
to  be  retested,  pulled,  densified,  and  replaced.  To  eliminate  the  "weak  link",  the  tiles 
are  densified  by  applying  a mixture  of  Dupont's  Ludox  AS  and  silica  slip  to  the 
underside  -or  inner  mold  line-  of  the  tile  to  an  approximate  thickness  of  0.010 
inches.  The  result  of  this  procedure  is  to  move  the  "weak  link*  up  into  the  tile  material 
itself.  Since  the  minimum  strength  of  the  basic  9 pcf  material  is  13  psi,  the  majority  of 
the  tiles  now  satisfy  the  maximum  induced-load  requirements.  Many  of  the  installed 
tiles  were  known  to  have  greater  than  the  minimum  13psi  strength  and  could  be 
1 shown  to  have  positive  margins  for  flight  loads.  The  tiles  that  could  not  be  shown  to 
meet  flight  loads  with  a positive  margin  were  replaced  with  22  pcf  tiles  whose 
minimum  strength  far  exceeds  the  maximum  flight  loads.  This  additional  work  meant 
that  the  30,000  tiles  on  Columbia  required  more  than  50,000  tile  installations  before 
the  first  flight.  Even  so,  not  all  the  tiles  were  densified  prior  to  the  first  launch,  but 
were  deemed  acceptable  based  on  proof  load  testing  to  1 .25  times  the  limit  stress. 
For  all  the  orbiters  after  Columbia,  the  tiies  were  densified  during  installation. 

Even  though  the  overall  temperatures  reached  during  re-entry  were  less  than 
the  maximum  allowable,  tiles  in  three  areas  were  found  by  flight  experience  to  be 
subjected  to  local  thermal  degradation  and/or  unacceptable  thermal  gradients 
resulting  in  a negative  margin  for  the  mid-fuselage  structure.  Three  redesign 
solutions  were  used  to  resolve  these  area-related  problems.  Tiles  inboard  and 
forward  of  the  main  landing-gear  doors  (denoted  as  "location  A"  tiles)  were 
knowingly  made  thinner  than  the  initial  thermal  design  thickness  to  minimize  weight 
and  to  retain  the  aerodynamic  mold  line.  The  thin  tiles  were  able  to  maintain  the 
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structural  temperature  limits;  because  the  initial  flights  were  flown  from  the  Eastern 
Test  Range  at  Kennedy  S?:  iv®  Center,  while  the  "thermal"  design  trajectory  was 
based  on  launches  from  the  ’"‘/sstern  Test  Range,  whbh  put  a greater  heat  load  on 
the  structure.  However,  e tismsive  analyses,  both  hernial  and  stress,  showed 
unacceptable  negative  struc  i /rat  margin  due  to  thermal  gradients.  These  negative 
margins  were  initially  resolved  toy  internal  structural  modifications  and  by  installing 
internal  heat  sink  materia!.  Later,  the  "location  AM  tiins  were  replaced  with  slightly 
thicker  tiles  (approximately  1 10  inches  thicker)  which  still  provided  an  acceptable 
aerodynamic  outer  mold  lint  based  on  flight  data  evaluation.  Tiles  between  the 
nose  cone  and  nose  landirii  gear  were  receiving  excessive  heating,  which  caused 
tile  slumping  and  subsurfacs;  I low.  These  tiles  were  ev  entually  replaced  with  a much 
more  durable  RCC  chin  pa  'Hit,  A similar  problem  r. spurred  with  the  eleven  cove 
tiles.  In  this  case,  the  size  r"  the  tiles  was  increased  thus  reducing  the  number  of 
troublesome  gaps.  All  three  modifications  have  prove r successful. 

Processing  and  maintenano  g , 

The  most  critical  TPM  problems  related  to  processing  and  maintenance  have 
occurred  with  various  water: Tinting  agents  that  have  aiected  the  strength  of  the  RTV 
by  reacting  chemically  with  ■ Ns  bond.  However,  in  addition,  a significant  set  of  other 
problems  have  arisen  beciLisa  of  maintenance  errors.  Initial  waterproofing  was 
done  with  an  external  application  of  Scotchgard  to  the  tile  surfaces.  This  was  not 
totally  effective  because  thi-s  waterproofing  degraded  with  exposure  to  rain  and 
sunlight.  On  the  second  f:  ■pf.1  tiles  that  had  absorbe  d and  trapped  water,  fractured 
when  ice  formed  in  orbit.  Ti  n s defined  a need  for  an  Internal  waterproofing  agent.  In 
addition,  the  Scotchguard  was  found  to  chemically  attack  the  RTV-560.  Fortunately, 
this  was  discovered  immediately  after  an  accidental  overspray.  The  first  internal 
waterproofing  agent,  HMD!:  was  found  to  react  with  the  screed  (RTV-577).  slowly 
reverting  it  from  solid  to  lie:  jid.  This  interaction  between  waterproofing  and  screed 
was  not  immediate,  and  eventually  led  to  the  loss  of  a black  tile.  Fortunately,  the 
other  nearby  tiles  affecter  by  the  softened  screed  lid  not  fail  during  reentry.  A 
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second  generation  of  waterproofing,  DMES,  has  been  developed  and  proven 
successful.  However,  the  long-term,  residual  effects  of  the  outdated  HMDS  are  still 
causing  concern. 

Several  chemical  spills  during  tile  installation  have  necessitated  the  removal 
and  rebonding  of  nearly  t ,000  tiles.  These  spills,  involving  an  oxidizer  on  Columbia 
and  hydraulic  fluid  on  Challenger,  demonstrate  the  sensitivity  of  the  tiles  and  their 
bonds  to  their  maintenance  environment.  Another  incident  involved  the  mislabeling 
of  a container  of  the  bonding  agent.  RTV-566  was  labeled  as  RTV-560  which  has  a 
shorter  drying  time.  The  bonds  were  not  allowed  to  cure  for  the  appropriate  time  and 
thus  were  weaker  than  allowed.  This  discrepancy  was  caught  during  final  pull 
testing.  Finally,  during  a return  flight  from  California  to  Florida  on  the  back  of  a 747. 
the  orbiter  Columbia  was  flown  through  a rainstorm,  damaging  over  1 ,000  tiles  of 
which  250  needed  replacement. 

Djebris 

Since  the  first  flight,  the  orbiter  has  always  been  exposed  to  external  debris 
damage.  Table  1 summarizes  the  damage  by  listing  total  number  of  hits  and  major 
hits  (greater  than  1 inch).  Simple  statistical  analysis  demonstrates  the  great 
variation  that  has  occurred  (Total  Hits:  mean  * 179,  standard  deviation  * 157;  Hits 
si":  mean  = 51,  standard  deviation  « 60).  This  variability  is  further  highlighted  in 
Figjjre  5,  which  shows  histograms  of  the  debris  damage  (for  the  upper  graph, 
number  of  flights  as  a function  of  the  total  number  of  debris  hits;  for  the  lower  graph, 
number  of  flights  as  a function  of  the  number  of  hits  greater  than  one  inch).  For  the 
first  flights  (until  STS-27R),  the  actual  major  source  of  debris  was  found  to  be  from 
portions  of  SOF1  insulation  from  the  External  Tank  (ET).  During  STS-27R,  the 
orbiter's  TPS  experienced  significantly  more  debris  damage  than  on  any  previous 
flight,  including  the  loss  of  a large  portion  of  one  black  tile  (Orbiter  TPS  Damage 
Review  Team,  STS-27R,  1 989).  Based  on  the  pattern  of  damage  and  the  recovery  of 
actual  debris  material  lodged  in  the  tiles,  AFRS1,  and  gaps,  it  was  possible  to 
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Sequence 

Designation 

Arbiter 

Date 

M 3 jo  rDebris 

Hite  > r 

Total  Debris 
Hits 

1 

1 

Cisiumbia 

04/1  2/81 

• 

2 

2 

Columbia 

1 1 /I  2/81 

o 

• 

3 

3 

Columbia 

03/22/82 

4 

• 

4 

4 

Columbia 

06/27/82 

• 

5 

5 

Columbia 

11/11  / 8 2 

• 

6 

6 

Challenger 

04/04/83 

3 6 

120 

7 

7 

Challenger 

06/1  8/83 

4 8 

253 

8 

8 

Challenger 

08/30/83 

7 

56 

9 

41 H 

Columbia  : 

1 1 728/83 

14 

58 

1 0 

41  B 

challenger 

02/03/84 

34 

63 

1 1 

41 C 

Challenger 

04/06/84 

8 

36 

12 

41  D 

Discovery 

08/30/84 

30 

111 

1 3 

41 G 

Challenger 

1 0/05/84 

3 8 

154 

14 

51  A 

Discovery 

1 1/08/84 

20 

87 

15 

51 C 

Discovery 

01/24/85 

28 

81 

16 

51 D 

Discovery 

04/12/85 

46 

152 

1 7 

51  B 

Challenger 

04/29/85 

63 

140 

1 8 

51 G 

Di  scovery 

06/1 7/85 

144 

315 

19 

51  F 

Challenger 

07/29/85 

2.26 

553 

20 

51 1 

Discovery 

08/27/85 

3 3 

141 

21 

51  J 

, ill  antis 

10/03/85 

1 7 

111 

22 

61  A 

Challenger 

10/30/85 

34 

183 

23 

61  B 

.Mlantls 

11/26/85 

55 

257 

24 

61 C 

Columbia 

01/1  2/86 

3 9 

193 

25 

51  L 

Challenger 

01/28/86 

+ 

* 

26 

26R 

SiSicovery 

09/29/88 

55 

41  1 

27 

27  R 

1 Columbia 

12/02/88 

250 

707 

28 

29  R 

I Cifiaovery 

03/11/89 

23 

132 

29 

30  R 

: .Vaiemtis 

>05/04/89 

56 

151 

30 

2SR 

Columbia 

‘08/08/89 

20 

76 

31 

34  R 

fill  amis 

i 10/1  8/89 

1 8 

53 

32 

33  R 

Cicavery 

11/22/89 

21 

118 

33 

32R 

• Columbia 

101/09/90 

1 5 

120 

Table  1 : Sun  "rary  of  arbiter  flights  and  debris  damage 
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Rgure  5:  Histogram  of  tile  damage  due  to  debris. 


Indicates  the  number  of  flights  that  experienced  a specified  amount  of  debris  damage  (i.e.  lour 
flights  had  40-60  total  hits,  two  different  flights  had  60*80  total  hits,  etc.)  based  on  available  data 
for  the  first  33  flights  (missing:  first  five  missions  and  STS-511) 
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determine  that  much  of  the  severe  damage  was  causr  I toy  insulation  from  the  cone 
area  of  the  right  SRB.  Othi:  damage,  minor  but  mere  extensive  than  usual,  was 
caused  by  the  insulation  of  t*n;  ST.  This  was  similar  v tne  type  of  damage  that  had 
been  experienced  in  previoi  ::  fights.  In  addition,  an  in-depth  analysis  done  at  the 
time  concluded  that  there  vns  no  obvious  correlation  between  tiie  damage  and 
launch  conditions  that  migt'  affect  ice  formation,  which  was  considered  earlier  a 
possible  source  of  tiie  impact  damage  (Orbiter  i [damage  Review  Team, 
STS-27R,  1989). 

Figure  6 displays  or  ::>ne  orbiter  surface  e cumulative  recording  of  all 
significant  tile  damage  fro  i nil  flights  and  all  orbit  3rs  (through  STS-32R.)  The 
damage  is  obviously  not  unrormly  distributed,  and  cerarn  tiles  are  much  more  likely 
to  be  damaged  than  others  Computer  models  developed  by  Ray  Gomez  at  JSC 
have  been  able  to  show  hoo  'isolation  from  both  the  -SRBs;  and  the  ET  could  cause 
such  damage  (see  Figures  ‘ li:  and  19  in  Section  3.)  he  complexity  of  the  problem 
does  not  currently  allow  loi  a direct:  and  focused  backtracking  from  a tile  on  the 
orbiter  to  a particular  spot  t*  insulation  because  the  trajectory  depends  on  many 
factors  (e.g.,  the  velocity  of  ie  arbiter  and  the  angle  >f  attack.)  It  may  be  possible, 
however,  to  determine  rout;  My  the  initial  location  and  the  size  of  loose  insulation 
necessary  to  inflict  specific  rfe/nage  (location  and  severity  i to  the  tiles. 

Debondinq  of  tiles  due  to  factif.rs.jathfl£, 

To  date,  as  mentions :!  above,  only  one  black  tiie  has  been  lost  due  to  factors 
other  than  debris  impact  (in  'hat  case,  chemical  reverrinn  of  the  screed).  There  are 
several  reasons  for  unsatisi"  dory  bonds:  1)  improper'  alignment  during  installation, 
2)  failure  to  comply  with  RT\  drying  limitations,  3)  chemical  reversion  of  the  screed  or 
RTV,  and  4)  possible  weak  : i tag  of  various  components  in  the  TPS  under  repeated 
load  cycles.  An  initial  investigation  of  a small  discrete  sat  ct  tiles  showed  that  a high 
proportion  of  the  bonds  t mat  had  pssed  the  pull  test  were  later  found  to  be 
unsatisfactory  (see  Figure  7!  Since  then,  however,  this  number  has  been  found  to 


Figure  6:  Accumulated  major  debris  hits  (lower  surface) 
for  flights  STS-6  through  STS-32R 
Source  of  data:  J.  McClymonds.  Rockwell  International 
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Figure  7:  The  tile  system  and  bond  verification 
Source:  Lockheed  Corporation  (1989).  R.  Welling.  Reproduced  by  permission 
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be  much  smaller.  A recent  and  on-going  evaluation  of  all  9,045  tiles  using  the  0.090 
and  0.115  inch  SIP  has  shown  that  of  the  6,517  tiles  evaluated  to  date,  only  8 
showed  anomalous  conditions  (most  of  which,  but  not  all,  were  subnominal  bonds). 
So  far,  during  normal  maintenance  and  the  replacement  of  debris-damaged  tiles,  1 2 
tiles  have  been  found  to  have  no  bond  between  the  SIP  and  the  orbiteris  skin.  These 
tiles  were  only  held  in  place  by  the  gap  filler’s  bond  to  adjacent  tiles. 

As  mentioned  earlier,  the  SIP  is  bonded  to  each  tife  using  RTV  while  the  filler 
bars  are  bonded  to  the  skin.  After  all  these  bonds  have  firmed,  a layer  of  RTV  is 
placed  on  the  skin  in  the  hole  defined  by  the  filler  bars.  The  tile/SIP  combination  is 
then  held  in  place  completing  the  installation.  If  the  tile/SIP  combination  is  not 
aligned  correctly  with  the  filler  bars,  the  SIP  may  rest  on  the  filler  bars  and  never 
touch  RTV  or  skin.  Obviously,  these  tiles  will  have  very  poor  bonds.  In  several  cases 
the  tiles  were  placed  correctly  between  the  filler  bars,  but  directly  over  exposed 
sensor  wires.  These  wires  prevented  complete  contact  between  the  SIP  and  the 
RTV  and  thus  made  for  a weak  bond.  It  should  be  noted  that  even  with  no  primary 
bond  between  the  SIP  and  the  skin,  tiles  have  still  passed  the  pull  tests  (because  of 
the  gap  filler  bonds)  and  that,  as  of  yet,  no  tile  has  been  lost  due  to  poor  installation. 

If  the  RTV  is  allowed  to  dry  before  the  tile/SIP  combination  is  placed  on  it,  the 
bond  will  not  develop  to  its  full  potential.  This  can  happen  when  several  tiles  are 
been  placed  at  one  time,  and  a single  batch  of  RTV  is  mixed  for  the  several  prepared 
sites.  If  the  installersare  not  careful,  the  RTV  may  exceed  its  "pot  life",  i.e.,  the  age 
beyond  its  safety  margin,  before  the  last  tile  is  placed. 

The  chemical  transformation  of  the  RTV  is  very  sensitive  to  temperature 
and  humidity  and  must  be  monitored  carefully  during  installation.  In  several  cases, 
the  curing  time  of  the  RTV  has  been  reduced  by  the  installers  using  water  (or  saliva). 
Such  a procedure,  which  is  explicitly  forbidden,  is  not  believed  to  affect  the 
immediate  strength  of  the  bond,  but  may  reduce  its  life.  A similar  class  of  problems 
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has  occurred  when  the  aiir  ninym  surface  has  not  been  properly  prepared.  In  this 
case,  the  RTV  bond  may  fe.  n't  the  interface  with  the  n titer's;  skin. 

The  only  black  tile  thin  has  been  lost  due  to  rfshonding  not  caused  by  debris 
occurred  when  the  first  intc  -ai  waterproofing  agent,  MMDS,  reacted  chemically  with 
the  screed  causing  it  to  soft  an  and  revert  back  to  its  mona  viscous  form.  The  formula 
of  the  waterproofing  ager  has  since  been  change  1 so  that  it  will  not  affect  the 
screed.  This  new  waterproofing  agent  has  completed  50  mission  cycles  on 
combined-environment  testing,  and  no  weakening  "f  the  TPS  system  was  found. 
Yet,  careful  monitoring  is  n quired  to  ensure  that  ri  :>  residual  amounts  of  the  old 
HMDS  agent  are  causing  a i'ery  slow  reversal  reacti  n and.  eventually,  loss  of  tiles. 
The  current  HMDS  testing  :;n  ooedures  involve  removi  ig  two  or  three  tiles  after  each 
flight  to  check  the  chemiuiil  composition  of  the  screed.  To  date  no  additional 
problem  has  been  found. 

In  the  long  term,  pasted  exposure  to  load  cycles  and  environmental 
conditions  of  heat  and  hi.  midity  on  the  ground  may  weaken  some  of  the  TPS 
components  and,  eventual! cause  tile  failure.  The  nest  vulnerable  tiles  are  those 
with  no  bond  or  very  little  ti  : (e.g.,  less  than  10%  *:  f the  surface)  between  the  SIP 

and  the  orbiter^  skin,  and  T est  are  held  primarily  by  he  gapfiiler's  RTV  bond  to  the 
adjacent  tiles.  RTV  bonds  i;c  far,  have  not  shown  visible  signs  of  deterioration  over 
time  and  load  cycles.  It  is  hr  own,  based  on  extensive  testing,  that  the  hundred-flight 
certification  is  justified  for  vis  1- bonded  tiles.  What  will  happen  in  the  future,  however, 
is  uncertain. 

After  some  flights,  several  cases  of  slumping  (sagging)  tiles  have  been 
observed.  These  are  easiiy  bantified  visually  since  they  break  the  smooth  surface  of 
the  orbiters.  According  to  David  Weber  at  KSC,  the  most  common  cause  of 
stumping  is  a weakenin;  of  the  SiP’s  fibers  due  to  repeated  load  cycles. 
Pre-densification  testing  s howed  that  the  part  of  ‘lie  tile  located  right  above  its 
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interface  with  the  SIP  was  the  weakest  part  and  was  most  likely  to  be  affected  by 
repeated  load  cycles.  With  densification,  this  weakest  zone  has  moved,  on  one 
hand,  further  up  into  the  tile,  and  on  the  other  hand,  down  into  the  SIP  itself.  A 
problem  in  either  location  is  difficult  to  detect  if  there  is  not  overt  visual  clue.  Yet, 
once  again,  to  date  no  tile  has  been  lost  due  to  repeated  load  cycles. 

2.3.2  Data 

Three  data  bases  have  been  identified  and  described  by  Ellen  Baker  and 
Bonny  Dunbar  as  part  of  their  TPS  Trend  Analysis  Survey  (March,  1988).  They  are: 
PRACA  (Problem  Reporting  and  Corrective  Action)  which  is  managed  by 
NASA.  Tile  problems  constitute  only  a subset  of  these  data.  The 
information  regarding  the  tiies  can  be  accessed  at  KSC. 

TIPS  (Tile  Information  Processing  System)  which  is  managed  by 
Rockwell  (Downey,  California).  The  specialist  is  Ms.  B.  J.  Schell, 
supervisor  of  the  TPS  Data  Systems  at  Rockwell  International,  Downey, 
California.  The  information  can  be  accessed  at  Downey,  JSC,  and  KSC. 
PCASS  (Program  Compliance  Assurance  and  Status  System)  which  is 
part  of  a NASA  (agency-wide)  System  Integrity  Assurance  Program  Plan. 

PRACA  and  TIPS  are  described  in  Appendix  2.  The  survey  conducted  in 

1 988  by  Baker  and  Dunbar  showed  that  a trend  analysis  was  judged  highly 
desirable: 

1 . To  monitor  the  performance  of  the  TPS  in  order  to  ensure  conformance 

with  design  requirements 

2.  To  ascertain  long  term  effects  of  TPS-reiated  procedures  (repairs,  etc.). 

3.  To  enable  engineering  design  changes  to  system  failure. 

The  participants  to  the  survey  indicated  that  there  was  a need  for  a single 
user-friendly  data  base  including  ail  useful  data  and,  in  particular,  resuits  of  trend 
analysis.  They  would  want  to  have  routine  access  to  this  data  via  a local  PC  or 
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terminal.  As  we  show  in  ser?!o?i  4,  the  risk-criticality  index  that  we  have  developed 
can  be  an  important  part  o'  ' ’ i?  record  for  trend  analysis  because  it  represents  the 
relative  contribution  of  eacl'  Is  to  the  probability  of  .Ov'  clue  to  TPS  failure.  These 
probabilities  can  be  updater  jn  the  basis  of  new  infc  mation  and  the  results  can  be 
encoded  for  ail  tites  that  sh;;i  similar  characteristics. 
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Section  3: 

DESCRIPTION  OF  THE  PRA  MODEL  FOR  THE  TILES 

3.1  Susceotibflitv  and  vulnerability 

Our  probabilistic  risk  assessment  (PRA)  model  for  the  black  tiles  of  the 
thermal  protection  system  (TPS)  of  the  space  shuttle  Is  based  on  two  major  factors: 
susceptibility  of  the  tiles  to  damage  and  vulnerability  of  the  shuttle  once  tile  damage 
has  occurred.  The  terms  susceptibility  and  vulnerability  have  been  standardized  in 
the  study  of  aircraft  combat  survivability:  their  use  in  the  space  shuttle  context  may 
facilitate  the  understanding  of  the  problem. 


Susceptibility  of  the  tile  system  to  damage  is  determined  by  the  combination 
of  loads  on  the  tile  and  its  capacity  (strength)  to  withstand  them.  Failure  occurs  when 
ihe  loads  exceed  the  capacity.  The  problems  can  generally  be  divided  into  two 
categories:  (1)  tile  loss  caused  by  excessive  external  loads  and  (2)  tile  loss  under 
regular  loads  caused  by  weaknesses  in  the  tile  system  (debonding  due  to  factors 
other  than  debris  impact).  A third  possibility  (a  combination  of  the  two)  is  the  case 
where  external  loads  not  severe  enough  to  cause  the  loss  of  a weil-bonded  tile, 
causes  the  loss  of  a weakened  tile.  In  this  study,  this  case  is  treated  as  a subset  of 
the  first  category.  Historically,  the  vast  majority  of  excessive  external  loadings  has 
been  from  debris,  mostly  from  the  external  tank  and  the  solid  rocket  boosters 
(defective  insulation  and  ice).  Also  included  in  this  category  is  space  debris. 
Depending  on  the  size  and  energy  of  the  debris  hitting  the  orbiter,  several  tiles  can 
be  damaged  simultaneously.  It  is  also  conceivable  that  the  reentry  temperature  may 
exceed  the  designed  capabilities  of  the  tiies,  leading  to  tile  failure  or  burn-through 
(for  example,  due  to  severe  malfunction  of  the  guidance  system). 


Capacity  reduction  caused  by  weaknesses  of  the  tile  system  account  for  tile 
losses  caused  by  long-term  deterioration  of  the  RTV,  defective  bonds  not  caught 
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during  installation,  and  tile  bonds  weakened  due  to  improper  maintenance 
procedures,  waterproofing,  and  spills.  These  weaknesses  could  affect  a single  tile 
(tile  resting  on  its  liller  bar)  or  a group  of  tiles  (use  of  a weak  batch  of  RTV).  Tile 
susceptibility  can  therefore  be  reduced  by  controlling  the  external  debris,  improving 
tile  installation  and  maintenance  procedures,  and  developing  new  tests 
(non-destructive  pull  tests  and  other  types  of  tests)  to  ensure  bond  verification. 
Another  approach  to  reducing  the  susceptibility  of  the  tile  system  that  will  not  be 
considered  in  this  study  would  be  to  harden  the  tiles  so  that  the  impact  of  external 
debris  would  not  cause  any  damage.  Extensive  use  of  RCC  would  be  one  such 
solution,  but  at  the  cost  of  a significant  increase  of  weight  and  design  complexity,  as 
well  as  an  enormous  additional  expense. 

The  vulnerability  analysis  starts  with  the  premise  that  a tile  has  been  lost  for 
whatever  reason,  then  proceeds  to  analyze  the  effects  of  this  loss  on  the  shuttle's 
performance  and  safe  return.  Of  primary  concern  in  this  phase  is  the  layout  of  the 
shuttle  systems  immediately  below  the  shuttle's  skin.  A heating  or  bum-through  of 
the  skin  could  cause  the  loss  of  various  hydraulic  lines,  computers,  fuel  tanks,  or 
even  a weakening  of  the  structural  integrity  of  the  spacecraft.  Also  included  in  the 
vulnerability  analysis  is  the  effect  of  an  initial  loss  on  the  surrounding  tiles.  When  the 
TPS  was  developed,  it  was  feared  that  one  hole  could  lead  to  adjacent  tiles  peeling 
off  because  of  reentry  heating  (the  so-called  zipper  effect).  This  phenomenon  has 
not  occurred  in  the  two  instances  where  tiles  have  actually  been  lost.  Yet.  the  loss  of 
a tile  clearly  causes  a local  turbulence  and  exposes  directly  the  side  of  the  next 
tile/SiP/RTV  system  to  high  loads  (forces  and  heat).  The  probability  of  loss  of  a 
secondary  tile,  although  obviously  not  equal  to  one,  is  still  higher  than  the  probability 
of  loss  of  the  first  tile  in  a patch.  If  not  checked,  the  loss  of  subsequent  tiles  could 
lead  to  exposure  of  a much  larger  patch  of  the  shuttle’s  skin.  The  vulnerability  of  the 
orbiter  could  be  reduced  by  moving,  hardening,  or  increasing  the  redundancy  of 
various  critical  control  systems.  If  the  tile  damage  can  be  discovered  prior  to  reentry, 
then,  in  some  cases,  the  vulnerability  of  the  shuttle  could  be  reduced  (either  by 
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protecting  the  exposed  pat-::*'  or  by  rerouting,  draining,  cr  securing  exposed  lines 
and  tanks.)  In  addition,  by  r.  '’ringing  the  reentry  “flight  i rolile  of  the  shuttle,  it  may  be 
possible  to  reduce  the  temp  i r attire  ot  some  weak,  vuNrafcle  areas.  The  sequence 
of  events  that  is  studied  in  tri : analysis  is  shown  in  Figure  8. 
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Figure  8:  Ever  diagram:  failure  of  the  TP 3 reading  to  LOV 

The  structure  of  the  j:  “iihabifistic  model  used  in  the  analysis  (Figure  9)  follows 
closely  that  of  the  elements  presented  in  Figure  8.  It  includes:  (1)  initiating  events 
(probability  distributions  for  the  number  of  tiles  initially  lost  due  to  debris  and  to 
debonding  caused  by  other  Hors),  (2)  final  patch  sizf  (probability  distribution  of  the 
number  of  adjacent  tiles  lost  conditional  on  the  loss  o'1  the  first  tile),  (3)  burn-through 
(probability  of  burn-through  editions!  on  a failure  parti  of  a given  size),  (4)  system 
loss  (probability  of  failure  cl  systems  under  the  skin  conditional  on  a bum-through), 
and  (5)  loss  of  orbiter  (probv  'j  iity  of  LOV,  conditional  on  failure  of  subsystems  due  to 
bura-through.)  The  analyte  is  thus  done  using  the  usual  mix  of  probabilities 
estimated  through  frequenc  = n,  and  of  subjective  probabilities  when  needed  (e.g.,  for 
the  probabilities  of  failure  <:f  {subsystems  under  the  inkin  for  which  no  formal  PRA 
studies  have  been  done).  Bayesian  formulas  were  used  to  compute  the  probabilities 
of  different  scenarios  as  den; tribed  further  in  this  sectim. 


Note  that,  in  this  stu:  / we  did  riot  account  for  excessive  heat  loads  (above 
the  design  criteria)  causini:  the  burning  of  a tile  due,  fo"  example,  to  tile  design 
problems  or  to  a malfunction  :ri  the  guidance  system  and/or  the  control  surfaces. 
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Discrete  random  variable;  number  of  initial  tiles  lost  due  to  debris 
Discrete  random  variable:  number  of  initial  tiles  lost  due  to  debonding 
Discrete  random  variable:  number  of  additional  tiles  lost  given  Initial  tile  damage 
continuous  random  variable:  severity  of  bum-through  given  a patch  size  of  missing  tiles 
Binary  random  variable:  subsystem  failure  occurs  given  level  of  bum- through 
Binary  random  variable:  LO  V occurs  given  loss  of  subsystems 


Figure  9:  Event  tree  of  LOV  due  to  TPS  failure 
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Although  this  failure  mode  mi:,,  contribute  to  the  overall  risk  erf  failure  of  the  orbtteris 
TPS,  it  was  considered  hen-  that,  these  initiating  eve  ts  now  have  a much  lower 
probability  than  the  loss  of  a.  'e  due  to  debris  damage  and/or  aebonding  caused  by 
other  factors. 

We  did  not  account  1 ::n  dependencies  among  *‘i©  probabilities  of  failures  of 
subsystems  under  the  skin  curs  to  TPS  failure;  for  example,  two  redundant  elements 
of  the  hydraulic  system  couic:  I:  e crippled  during  the  same  flight  by  loss  of  tiles  in  two 
different  locations.  The  prob; ility  of  such  simultaneous  failures  was  considered  to 
be  too  small.  Finally,  we  did  not  account  for  dependencies  among  tile  failures 
caused  by  the  repetition  of  tlu  name  mistake  (e.g.,  from  the  same  technician)  which 
becomes  a common  cause  of  failure  (for  example,  adef  tion  of  water  to  the  RTV  mix 
and  treatment  of  several  tiles  ) This  concern  will  be  part  of  the  second  phase  of  the 
study. 

3.2  Definition  of  mln-zoii f 

Because  of  the  facte  m described  above,  the  slack  tile  system  cannot  be 
treated  as  a uniform  structure.  Debris  is  more  likely  to  hit  some  parts  of  the  arbiter 
than  others,  different  bonding  materials  are  used  in  different  areas,  temperatures 
vary  considerably  over  the  si.  ^ace,  and  critical  subsystems  are  located  only  in  a few 
areas.  Therefore,  for  this  analysis,  the  entire  tile  protection  system  is  subdivided  into 
smaller  areas,  called  here  rr  n-zones*  such  that  ail  fifes  of  a specific  min-zone  have 
the  same  level  of  suscept  ’ rity  and  vulnerability.  Depending  on  the  number  of 
discriminating  characterist  the  number  of  tiles  in  each  min-zone  could 
conceivably  vary  from  a single  tile  to  thousands.  (An  alternative  approach  would  be 
to  categorize  each  tile  individually  with  regard  to  susceptibility  and  vulnerability,  but 
since  most  adjoining  tiles  hui'G  identical  characteristic,  this  level  of  detail  is  not 
needed.) 


43 


Pate-Comell  and  Fischbeck 


The  definition  of  min-zones  is  critical  to  the  analysis.  The  number  of  factors 
used  to  delineate  the  min-zones  determines  the  complexity  of  the  problem.  As  an 
initial  cut,  we  define  a min-zone  by  four  factors:  (1 ) susceptibility  to  debris  impact,  (2) 
potential  for  loss  of  additional  tiles  following  the  loss  of  the  first  one  (depending  on 
heat  and  aerodynamic  loads),  (3)  potential  for  burn-through  given  one  or  more 
missing  tiles  (heat  loads),  and  (4)  criticality  of  underlying  systems.  For  this  study,  it  is 
assumed  that  the  probability  of  debonding  caused  by  factors  other  than  debris 
impact  is  uniform  over  the  arbiter's  surface  and  does  not  require  a separate  partition 
of  this  surface.  As  mentioned  above,  it  is  also  assumed  that  flight  profiles  will  not 
expose  the  entire  TPS  to  severe  temperatures  that  would  exceed  their  specifications. 


3,2.1  Debris  classification 

In  order  to  account  for  the  fact  that  debris  damage  during  ascent  is  not 
uniformly  distributed  across  the  underside  of  the  orbiter,  the  black  tiles  are 
partitioned  into  three  debris  areas  such  that  all  tiles  in  a particular  area  have  roughly 
the  same  probability  of  being  initially  damaged  by  external  debris.  The  definition  of 
these  debris  areas  also  accounts  for  the  fact  that  some  areas  are  more  susceptible  to 
being  hit  by  large  pieces  of  debris  that  will  damage  several  adjacent  tiles 
simultaneously. 


To  define  the  debris  zones,  we  plotted  all  known  debris  damage  from  the  first 
33  fjights  on  a single  shuttle  layout  (see  Figure  6.)  These  data  came  from  J.  W. 
McClymonds  (1989)  at  Rockwell  in  Downey.  Areas  with  similar  damage  intensity 
were  grouped  together  into  high,  medium,  and  low  debris  damage  areas  (see  Figure 
10.)  An  estimated  probability  of  tile  damage  due  to  debris  per  flight  was  determined 
by  dividing  the  number  of  hits  by  the  number  of  tiles  in  each  area  and  by  the  number 
of  flights.  A similar  plot  and  calculation  was  done  for  all  damage  to  black  tiles  over 
one  Inch  in  size.  (Historically  about  one  fourth  of  the  damage  has  been  greater  than 
one  inch  In  size.)  It  should  be  noted  that  the  only  missing  tile  to  date  caused  by 
debris  is  in  one  of  the  "high  debris  damage  areas". 
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P 
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High  probability  of  debris  damage 


|p  I Medium  probability  at  debris  damage 


£ | Lew  probability  of  debris  damage 


Rgure  10:  Parth  rin  of  the  orbiteris  surface  :ito  three  type?  of 
jf-jbris  zones  (index:  h) 
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f Based  on  this  analysis,  the  probabilities  of  a specific  tile  receiving  any  debris 
damage  were  assessed  as  shown  in  Table  2.  The  probability  of  multiple  tile 
damage  was  calculated  using  a typical  six-inch  by  six-inch  square  tile  and  estimating 
the  percentage  area,  within  a 1/2  inch  border,  that  would  allow  for  other  tiles  to  be  hit 
simultaneously  with  sufficient  energy  to  cause  significant  damage. 


Debris  Ar^g 

Htoh 

Medium 

P(Singie  tile  hit) 

10-2 

3x10-I * 3 

5x1 0*4 

P(One  of  two  tiles  hit)* 

8x1 0*4 

2X10*4 

4x10-5 

P(One  of  three  tiles  hit) 

7x10-5 

2x10*5 

3X1 0*5 

*P(one  of  x IHes  hit)  • probabilfty  that  a particular  tile  is  in  a group  of  x adjacent  hit  tiles 

Table  2:  Probabilities  of  debris  hits  in  the  different  areas  shown  in  Figure  10 

I Translating  this  information  into  the  probability  that  a specific  tile  will  be 
knocked  off  or  so  significantly  damaged  as  to  burn  off  during  reentry  is  a more 

difficult  task.  H is  logical  to  assume  that  the  probability  of  this  level  of  damage  is  the 
ratio  of  the  number  of  destructive  hits  to  the  total  number  of  hits  in  the  past.  Since 
one  tile  has  been  lost  out  of  roughly  two  thousand  significant  debris  hits,  it  is 
proposed,  in  this  study,  to  use  an  initial  estimate  of  1 in  2,000  {5x1 0*4)  for  the 
probability  that  large  hits  would  destroy  a tile's  insulating  capability  in  the  high  debris 
areas.  Slightly  smaller  probabilities  were  used  in  the  medium  and  low  debris  areas. 

The  probabilities  of  tile  loss  due  to  debris  hits  for  each  tile  in  each  area  of  Figure  1 0 
have  been  further  allocated  as  shown  in  Table  3.  For  example,  the  probability  of  a 

single  tile  loss  in  "high*  debris  area  is  the  product  of  (1 ) the  probability  that  the  tile  is 
hit  by  a debris,  (2)  the  probability  that  the  size  of  the  hit  is  greater  than  1"  conditional 
on  a hit  and  (3)  the  probability  that  the  tile  is  knocked-off  given  a large  debris  hit. 
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Debris  Area 

Histb 

Me:  ii.in 

Low 

P(Single  tile  lost) 

1.3  X 10*6 

10? 

10*9 

P(One  of  two  tiles  lost)* 

-.0-7 

ic* ® 

0 

P(One  of  three  tiles  lost) 

1 0'® 

10-T 

0 

P(one  of  x ties  lost)  * probability  " ss  a particular  tile  is  in  a group  if  x adjiscent  lost  ties 

Tabla  3:  Probabilities  of  tin  css  due  to  debris  in  the  Afferent  areas  shown  in  Fig.  1 o 


3,2.2,Bum-lhrpuqf!. 

In  a similar  fashior  she  tiles  are  partitioned  frto  three  burn-through  areas 
(see  Figure  11.)  The  probability  of  a burn-through  i dependent  on  two  factor:  the 
temperature  that  the  surf.ii  : 3 reaches  during  reentry  (and  for  how  iong),  and  the 
ability  of  the  unprotected  s irmnum  skin  to  dissipate  tie  heat  build  up.  The  denser 
and  stronger  the  structure  under  the  skin,  the  greater  the  capacity  to  resist 
burn-through.  In  both  cabins  where  tiles  have  beun  lost,  burn-through  has  not 
occurred  in  part  for  this  reason.  The  larger  the  patch  of  missing  tiles,  the  greater  the 
likelihood  of  burn-through.  ho  probabilities  shown  in  Table  4 were  estimated  from 
information  provided  by  Ro:  sit  Maria  of  NASA  John  >on  Space  Center  in  Houston. 
Once  again,  these  are  only  mnrse  estimates. 


Bum-throuah  Area 

...  Hiah 

.—Ms  iiurn... 

Low 

P(Slngle  tile  lost) 

0.2 

0.1 

0.001 

P(One  of  two  tiles  lost)* 

0.7 

0.2  s 

0.01 

P(One  of  three  tiles  tost) 

. _ i 

0.95 

0.7 

0.1 

P(one  of  x tiles  lost)  - probability  vmt  a particular  tile  is  in  a group  o'  x adjacent  tost  tiles 

Table  4.  Probabilities  of  I • . i t n -through  due  to  tile  los:»  in  areas  shown  in  Fig.  1 1 
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High  probability  of  bum-through 

HH  Medium  probability  of  bum-through 
□ Low  probability  of  bum-through 


Figure  1 1 : Partition  of  the  orbitehs  surface  into  three  types 
of  bum-through  zones  (index:  k) 
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Note  that  the  two  anas  just  in  board  of  the  r "tain  landing  gear  have  been 
notated  as  being  in  the  hij:  i burn-through  area.  This  ?s  not,  strictly  speaking,  a 
burn-through  problem.  Th«i  structure  in  those  arecS  is  extremely  sensitive  to 
temperature  differences  and  would  fail  even  withou  a burn-through.  However, 
because  of  their  sensitivity  t::  :$rnperature,  these  two  areas  ‘were  grouped  in  the  high 
burn-through  category. 

3,2.3  Secondary  tile  rsr.  classification 

In  order  to  account  lor  the  potential  of  a single  tile  causing  the  loss  of 
adjacent  tiles,  the  orbiter  is  di  vided  into  two  secondary  die  loss  areas  (see  Figure 
12.)  The  probability  of  addiibnal  tile  loss  depends  or  me  aerodynamic  forces  and 
on  the  magnitude  and  dure.- inn  of  the  increased  reentry  temperatures  that  occur 
around  a missing  tile  due  to  the  disruption  of  the  laninar  flow.  This  increase  of 
temperature  also  depends  vf:\  the  ability  of  the  skin  ir  dissipate  the  heat  buiid-up. 
The  RTV  bond  wiil  fail  abov-  300°F.  Because  of  this,  the  secondary  tile  loss  areas 
are  related  to  the  temperatun  areas  used  in  the  bum-through  analysis  above.  In  this 
study,  the  two  secondary  tile  mss  areas  will  be  defined  by  the  probability  of  adjacent 
tile  loss  shown  in  Table  5.  These  values  were  estimated  from  information  provided 
by  Robert  Maria  from  NASA  ;i:  JSC. 


Zone  1 (high  loads):  Additional  tile  lost  ] One  til©  lost)  * 10*2 

Zone  2 (low  loads):  P(Additionat  tile  lost  j One  tile  lost)  = 10*3 


Table  5:  Probabilities  of  losing  adjacent  tiles 
due  to  ink  si!  tile  loss  in  areas  shown  n Figure  12 

A failure  patch  is  defined  as  a group  of  lost  tiles  that  started  from  one 
initiating  event  (initial  tile  lc:  ;}  and  has  reached  its  maximum  size.  The  size  of  a 
failure  patch  depends  on  ha  number  of  tiles  inrially  damaged  and  on  the 
subsequent  vulnerability  of  thir-  adjacent  tiles. 


J 
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3.2.4  Functional  critictilfiy  classification 

The  varying  criticalily  of  the  subsystems  of  the  orbiter  located  under  the 
aluminum  skin  is  handled  partitioning  the  tiles  u tc  three  functional  criticality 
areas.  Once  a burn-throug  h has  occurred,  various  systems  would  be  exposed  to 
extreme  heat  and  would. fail  if  those  systems  were  essential  for  flight,  their  failure 
could  lead  to  the  loss  of  thi  arbiter.  By  examining  the  location  of  critical  systems 
(electrical,  hydraulic,  fuel,  stn,  as  shown  in  Rgures  '3  and  14),  three  areas  were 
identified  (Figure  IS).  The  r lowing  probabilities  were  estimated  by  assuming  that  a 
bum-through  would  cause  a ■?  area  of  four  square  feet  i.rnurd  the  hole  to  be  exposed 
to  hot  gases. 


Area  of  high  function”1  criticality:  P(Loss  of  orbitsr  | Bum-through)  * 0.8 

Area  of  medium  functional  criticality:  P(Loss  of  arbiter  | Bum-through)  = 0.2 
Area  of  low  function?; ! criticality:  P(Loss  of  arbiter  | Bum-through)  = 0.05 


Table  6:  Probabilities  of !.  ov  conditional  on  burn-thr iugh  in  functional  criticality 

areas  shown  in  Rgure  1 1. 

3.2.5  Debondinoca ; iTSJJXtlfiUJdiSO ..  IfSildS Jmaafit 

In  this  model,  it  is  simimed  that  the  probability  of  debonding  caused  by 
factors  other  than  debris  impact  is  the  same  for  all  tilPR.  In  reality,  the  location  of 
screed,  thin  SIP,  and  gap  filler  as  well  as  the  age  of  TTV,  and  the  temperature  and 
pressure  zones  would  affc  :::  the  probability  of  debnnding.  Short  of  conducting 
considerable  additional  research,  this  simplification  should  be  adequate.  Again,  the 
probabiiities  used  for  illus':  ;t#on  are  only  coarse  estimates  that  are  intended  to 
provide  an  idea  of  the  rela  magnitude  of  the  dehanding  problem  to  the  debris 
problem.  Another  relationship  not  considered  directly  ir  this  analysis  is  the  effect  of 
weak  bonding  on  the  susceplihifity  of  a tile  to  debris  impact.  A weakened  tiie  is  much 
more  likely  to  be  dislodged  i:  p a rnedium-sized  debris  \ it.  For  the  purposes  of  this 
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Figure  13:  Component  and  systems  location 
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model,  with  its  uniform  di::  :lbi.ition  of  debonding,  this  factor  is  included  in  the  debris 
analysis. 

Of  the  approximate/  130,000  black  tiles  that  have  been  installed  at  various 
times  on  all  the  orbiters.  ‘ ?.  have  been  found  during  maintenance  to  have  no  bond 
other  than  through  the  gar  A complete  analyst  ; of  tile  capacity,  as  revealed  by 
the  maintenance  observe !‘  sms,  will  be  part  of  the  second  phase  of  this  work.  We 
assumed,  for  the  moment.  : h sit  about  half  of  the  unbonded  tiles  that  are  held  in  place 
by  the  gap  fillers  have  birs  * detected  by  now,  elthT*  because  of  visible  slumping  or 
because  they  have  been  -••••placed  for  other  reason:  such  as  debris  damage  {about 
25%  so  far  have  been  retraced.)  Those  with  no  brnc!  that  have  not  been  detected 
so  far  are  those  that  hav  - not  yet  shown  visible  .Tons  of  weakness  and  have  not 
needed  replacement. 

David  Weber  fror  KSC  estimated  that  a fi  e with  this  weak  a bond  would 
have  a probability  of  fail.xa  of  one  in  a hundred  (1C)*2)  per  flight,  making  the 
probability  of  debonding  r this  kind,  for  any  tile,  to  be  approximately  9.0  xlO*7  per 
flight.  Estimating  the  probabilities  for  the  other  type  s of  debonding  (excluding  those 
caused  by  debris  impact)  1;  more  subjective.  We  used  a previous  Lockheed  study  of 
bond  verification  (see  Figure  16)  and  confirmed  the  results  during  discussions  with 
David  Weber.  This  stud;'  gives  relative  values  of  the  probabilities  of  different 
debonding  modes.  Following  these  results,  we  assumed  that  chemical  reversion  of 
the  screed  and  weakening  :lue  to  repeated  exposure  to  load  cycles  are  less  iikely  to 
cause  debonding,  and  we  used  a probability  of  failure  of  2 x 10*7  per  tile  and  per 
flight.  As  a further  simplilii:  ation,  these  two  probabilities  (weakening  due  to  repeated 
exposure  to  load  cycles  3 id  insufficient  bonding)  are  assumed  to  be  independent 
and  can  thus  be  added.  In  actuality,  poorly  bonded  tiles  or  tiles  resting  on  soft 
screed  are  likely  to  be  m*. ■:  ’■*  more  susceptible  to  thiu  kind  of  weakening.  Using  these 
values,  the  probability  of  ::sing  at  least  one  of  the  dies  due  to  debonding  caused  by 
other  factors  than  debris  ' pact,  on  any  flight,  woul^  be  a little  more  than  0.02,  which 
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Source:  R,  Welling,  Lockheed  Corporation  (1989)  Reproduced  by  permission 
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then  implies  that  over  35  irjhts,  the  probability  of  losing  at  least  one  tile  on  one  of  the 
flights  is  a little  less  than  550.  This  appears  reasonable  based  on  historical  events 
and  the  one  missing  tile. 


3.3_ PRA  model:  deflniisn  of  variables 

Throughout  the  rem  of  the  anaiysis.  the  are-s  defined  in  the  previous  section 
are  indexed  as  follow: 


»:  Index  of  nr  -zones 

h:  Index  of  ill: bris  areas 

j:  Index  of ' . notional  criticality  areas 

k:  Index  of  i\  -through  areas 

I:  Index  of ! icondary  tile  loss  areas 


Note  that  a double  subscript  (e.g.,  ji)  represents  parameter]  (criticality  in  this 
case)  of  min-zone  i and  that  the  term  "debonding"  ref«rs  to  "debonding  due  to  factors 
other  than  debris  impact" 


n:  Total  number  of  black  tiles  on  the  arbiter 

np  Number  r;x  tiles  in  min-zone  i. 


N:  Total  number  of  min-zones 

Np  Number  n-  failure  patches  in  min-zcne  i. 
q:  Index  fo:  thr>  failure  patches  in  any  min-zone 

M:  Final  number  of  tiles  in  any  failure  patch 

m:  Index  fo’  tbs  number  of  tiles  in  a failure  patch 

Ft:  Initiating  T ilure  of  a tile 

Fa(Ft:  Failure  o“  any  adjacent  tile  given  initiating  failure 

0:  Number  n1  adjacent  tiles  in  initial  debris  area 

S:  Number  r!  adjacent  tiies  in  initial  d^boncring  area 

L:  Loss  of  s’ 3 tide  (LOV> 

P(X):  Probability  of  event  X 

P(X]Y):  Probabiiily  of  event  X conditional  on  event  Y 
P(X,Y):  Joint  prol  • nbility  of  event  X and  eve  it  Y 
EV(Z):  Expectei:  value  of  random  variable  Z 


This  analysis  folic:  .«•$  closely  the  structure  of  variables  described  in  Figure  9. 
Two  types  of  initiating  evi-  rts  are  considered:  those  caused  by  debonding,  and  those 
caused  by  debris  impact.,  f*  third  category,  failure  -if  the  tile  itself  due  to  heat  toads. 
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may  be  added  later.)  It  is  assumed  that  the  two  types  of  initiating  events  are 
probabilistically  independent.  Since  each  min-zone  has  its  own  set  of  characteristics, 
they  are  treated  as  separate  entities.  Tiles  in  each  specific  min-zone  have  the  same 
probabiiity  of  being  initiaiiy  damaged  and  of  causing  a larger  failure  patch, 
burn-through,  damage  to  a critical  system,  and  the  loss  of  the  vehicle.  Because  of 
these  assumptions,  the  analysis  determines  first  the  probabiiity  of  losing  the  vehicle 
for  each  type  of  Initiating  event  and  each  min-zone.  The  overall  failure  probability  is 
the  sum  of  the  failure  probabilities  for  all  zones  and  initiating  events.  Debris  impacts 
are  considered  first. 

3.4  Initiating  event:  initial  debris  imnact  on  one  tile  onlv  (D=1) 

To  determine  the  probabiiity  that  a specific  tile  in  min-zone  i starts  a patch 
due  to  debris  impact,  it  is  also  necessary  to  consider  the  size  of  the  initial  damage. 
We  consider  first  the  case  where  a single  tile  is  initially  damaged.  Throughout 
section  3.4,  it  should  be  remembered  that  the  probabiiity  of  initial  tile  failure  in 
min-zone  i,  P/Ft),  should  be  read  as  Pj(Ft|D=1).  Next  sections  consider  Pj(Ft|D=2) 
and  Pj(Ft|D=3).  These  additional  levels  of  initial  damage  (two  and  three  tiles 
simultaneously)  are  combined  later. 

Once  the  first  tile  in  min-zone  i is  lost  due  to  debris,  there  is  the  potential  for 
adjacent  tiles  to  also  fail.  The  probabiiity  that  the  final  patch  size  reaches  M depends 
on  the  secondary  loss  index  of  the  min-zone  (lj)  and  is  given  by  the  following 
geometric  distribution  (which  means  that  M-1  additional  tiles  fail  and  no  adjacent  tile 
afterwards:) 

P}(M  | Ft)  * P^FafFt)^1  x (1  -P|i(Fa[Ft)]  (1 ) 

Note  that  M must  be  at  least  equal  to  1 . This  equation  assumes  that  the 
probability  that  adjacent  tiles  debond  does  not  change  as  the  patch  grows. 
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In  each  min-zone,  {here  is  the  possibility  of  several  patches  starting.  The 
probability  that  the  numbe'  x patches  reaches  Nj  in  nin-zane  i is: 

Pj{Nj)  = ns!  Pi(Ft)N:  >;  {1 -Pj(Ft)]nfN1  (2) 

Nj!  (nj-Nj;  i 

This  formulation  assumes  that  the  initial  tile  failures  are  independent,  and  that 
there  will  be  no  overlapping  of  patches  because  the  probability  of  an  initiating  event 
(Ft)  is  small  compared  tc  ■ 'is  number  of  tiles  in  each  min-zone  (nj).  The  product 
EV(Nj)  * EV(M)  which  ea.ats  the  total  number  of  tiles  lost  in  each  min-zone  is 
considered  negligible  compared  to  nj.  Also,  N-,  (number  of  patches)  and  M (size  of 
patches)  are  considered  independent  random  variables.  Based  on  these 
assumptions,  the  expecter ' lumber  of  patches  is  approximately: 

EV(Nj) - n{  x Pj(F  (3) 


and  the  size  of  each  patch  Is  given  by  the  mean  of  tlr*  distribution  of 


M: 


EV(M)  *s  1 /(1-Pj{Fi  i Tt)j 


(4) 


Given  this  result,  il  ;s  now  possible  to  calculate  the  probability  that  the  orbiter 
will  fail  due  to  debris  that  impact  one  tils  only.  Remembering  that  j is  the  index  of  the 
criticality  areas  and  k in  the  index  of  the  bum-through  areas,  we  define  the 
probabilities  of  orbiter  faui.re  due  to  a patch  of  size  M,  in  min-zone  i,  initiated  by 
debris  impact  (D*1)  as  to-‘:iws: 


Pj(L|  M=i)  = Pjkii1 
Pj(L|  M=2)  = pjkU 


^ A 
0» 


Pj(L|  M=m)  - pjki  n-i 


(5) 
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It  must  be  remembered  that  any  given  min-zone  could  have  several  patches 
in  it,  and  each  patch  could  be  of  a different  size.  To  calculate  the  probability  of 
orbiter  loss  due  to  specific  number  of  patches  (Nj)  in  min-zone  i,  the  following 
definition  is  necessary.  Let  p'j  be  the  probability  that  an  arbitrary  patch  in  min-zone  i 
causes  a failure. 

oo 

P'i  = X Pjki,  m * P (patch  size  * m)  (6) 

OO 

P'i'Xpjiim  < P«  (Fa|Ft)">-<  x [1-P,j(Fa|R)] 

nvl  (7) 

Therefore,  q being  the  number  of  patches  in  a given  min-zone,  the  failure 
probability  for  a specific  number  of  patches  in  a min-zone  is: 

Pi(L|  Nj«q)  « p'j  x q (8) 

Once  again,  this  assumes  that  the  probabilities  are  smaii  and  that  the  patches 
will  not  interfere  with  each  other  (they  are  assumed  to  be  separate  and 
independent).  These  assumptions  are  valid  providing  that  each  min-zone  has  a 
sufficiently  large  number  of  tiles  and  that  the  size  of  the  patches  is  relatively  small. 

Based  on  Equation  (8),  the  probability  of  orbiter  failure  given  all  patches  that 
occur  In  min-zone  i becomes: 

OO 

P(L|min-zone  i)  * X P|(L|N|*q)  x P,-(Nj*q) 
q-0 

= Zp'iXqxPi(Ni«q) 
q-0 

* P'i  x EV(Nj) 


p'i  x nj  x Pj(Ft) 


(9) 


Pate-Comeil  and  Fiscn&eck 


This  result  represer  si  only  the  cases  of  detris  impact  causing  the  initial 
failure  of  a single  tile.  A men  i complete  rewriting  of  Equation  9 highlights  this  fact: 

P(L|min-zone  i,  D=1)  - ;:i'j(Dsl)  x n;  x Pj(Ft[D  = 1 ) (10) 

— Initiating  event:  Inliui  debris  Impact  on  several  tiles  (D=d) 

* In  order  to  expand  tl'iu  model  to  include  the  possibility  that  the  initial  debris 
impact  damages  more  thar  : r & tile,  it  is  necessary  'o  modify  some  of  the  above 
equations.  It  Is  assumed  "at  if  a large  enough  pi*  20  of  debris  hits  the  orbiter, 
several  adjacent  tiles  may  h;  knocked  loose  at  once:  Each  of  these  missing  tiles 

may  in  turn  cause  their  adjscunt  tiles  to  fail  and  a specific  number  of  additional  tiles 
can  fail  in  multiple  ways.  Th;  refore,  additional  summations  are  required  in  order  to 
account  for  the  increased  umber  of  exposed  tiles  This  compounded  problem 
requires  that  Equation  (1)  :n  rewritten  to  account  for  this:  potentially  larger  patch 
growth  rate.  If  the  initial  darings  involves  two  tiles,  the?  probability  that  the  final  patch 
reaches  size  M is: 


Pj  (M|Ft,  D=2)  = (M-l  : 1)  x PH  (Fa|Ft)M*2  x [l  • Pjj(FaiFt)]2  (11) 

If  three  tiles  are  darr.arjed  initially: 

Pi  (M|Ft,  D«3)  * [ I ] x Pij  (Fa|Ft)M-3  x [l  - PijfFaj  Fi)]3  (12) 

it*'; 

If  four  tiles  are  damsqud  initially: 

KM-  * ^ 

Pj  (M|Ft,  D=4)  = [S  X,  i ] x P|j  (FalFt)M*4  x [i  - FijfFajFt)]4  (1 3) 

k-i  M 
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This  set  oi  equations  can  be  extended  to  include  greater  initial  damage; 
historical  evidence,  however,  supports  limiting  the  analysis  to  this  level.  It  must  be 
remembered  that  the  value  M of  the  final  patch  size  must  always  be  at  least  equal  to 
the  size  of  the  initial  damage  area,  D.  Equation  (2)  in  its  most  general  form  is  written: 

Pi(Nj|D«d)  = Ni|  Pi(Ft|D=d)Ni  x [1  -Pj(Ft|D=d)]n'*Ni  (14) 
nj!  (Nj-rij)! 

and  Equation  (3)  becomes: 


EV(Nj)=  n,  x Pi(FtiD*d)  (15) 

Equations  (5)  and  (6)  do  not  change  except  tor  the  indexing  of  the  summation 
since  their  results  depend  oniy  on  the  final  patch  size  and  the  functional  criticality 
index.  Equation  (7)  would  change  as  Equations  (11)  to  (13)  are  integrated  to 
account  for  the  various  debris  damage  areas.  The  final  probability  lor  each  initial 
damage  area  and  min-zone  is  computed  using  a variant  of  Equation  10: 

P(L|min-zone  i,  D=d)  - p’j(D-d)  x nj  x Pj(Ft}D=d)  (16) 

Because  all  the  initial  damage  probabilities  are  very  small,  it  is  possible  to 
approximate  the  probability  of  debris  causing  loss  of  an  orbiter  for  all  damage  areas 
in  a particular  min-zone  by: 

Maxd 

P(L|min-zone  i,  debris)  = £ P(L|min-zone  i,  D=d)  (17) 

cM 

Once  this  probability  is  determined,  the  probabiiity  of  orbiter  failure  lor  all 
min-zones  due  to  debris  impact  is  simply  the  sum  of  the  probabilities  of  failure  for  ail 
min-zones  since  ail  min-zanes  and  initiating  events  are  assumed  to  be  independent: 
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N 

p(L|debris)  = X F I.  [irirn-zone  i,  debris)  (18) 

i 

3.6  Initiating  ?vc  nt : „ 5; „ .1  ailfilS  other  than  debris 
irrroact 

The  same  proa: cure  and  basic  formulas  are  used  to  determine  the 
probability  of  orbiter  faih.  n due  lo  debonding  caused  by  factors  other  than  debris 
impact.  Again,  the  probanMity  of  orbiter  failure  due  -c  failure  of  the  TPS  is  computed 
from  the  probability  of  tils  s spontaneously  debonding  in  groups  of  various  sizes  in 
each  min-zone.  The  prcr  is  slightly  easier  sinon  it  is  assumed  that  the  likelihood 
of  such  debcnding  is  unifrin  across  all  tiles.  The  rnhabiiity  of  secondary  tiie  failure 
Pi(Fa|Ft)  is  the  same  a?:  'or  the  debris  problem.  The  probability  of  orbiter  failure 
based  on  all  patches  in  nhvzone  i that  started  fror  s damage  area  of  initial  size  s is 
given  by: 

P(L{mln-zone  i,  S -:• } * p'j(S«s)  x nj  x Pj(Ft;  S»s)  (19) 

The  other  equaticr  s follow  accordingly.  TN  iotaJ  probability  of  shuttle  failure 
for  damage  initiated  by  d«*hnnding  caused  by  factor;  other  than  debris  impact  is: 

N 

P(Lldebonding)  = £ P(Ljmin*zone  i,  deboniing)  (20) 

Finally,  assumim;  independence  of  initiatir  g events  (debris  and  debonding 
due  to  other  causes),  t'ia  overall  probability  of  siiuttle  failure  per  flight  due  to  tile 
damage  is: 


P(Ljtile  problem)  ■*  P(L|debonding)  + P(L)df  brts) 


(21) 


Pat6-Comell  and  Fischbeck 


3.7  Additional  information  and  data 

A PRA  model  like  the  one  described  above  needs  to  be  constantly  updated  to 
reflect  information  that  may  have  existed  before  but  had  not  been  uncovered  at  the 
time  of  this  initial  study,  and  information  from  new  experience  including  recent 
inspections,  tests,  evaluations,  studies,  and  in-flight  performance  data.  In  this 
implementation  phase,  more  refined  data  may  thus  be  used  and  additional 
information  available  at  NASA  can  be  introduced  in  the  analysis.  One  important  part 
of  the  problem  at  that  stage  will  be  to  capture  the  evolution  of  the  failure  probability  of 
the  orbiter.  Clearly,  the  system  is  not  in  a steady  state . On  one  hand,  the  quality  of 
the  maintenance  work  appears  to  improve  (Figure  17).  Initial  defects  of  the 
installation  work  that  resulted  in  a decrease  of  the  tile  capacity  are  progressively 
being  discovered  and  corrected  during  successive  maintenance  operations.  Existing 
problems,  such  as  the  impact  of  chunks  of  insulation  from  the  ET  and  the  SRBs  or  the 
elevon-cove  design  problem,  are  resolved  as  they  are  discovered.  On  the  other 
hand,  the  possibility  of  long-term  deterioration  of  the  TPS  clearly  increases  the 
probability  of  tile  failure  (even  if  slowly)  and  the  rate  of  deterioration  is  a major 
unknown.  Of  specific  concern  are:  the  possibility  of  degradation  of  the  bond  over 
time,  of  slow  chemical  reaction  due  to  water  proofing  agent,  and  of  weakening  of  the 
SIP/tile  system  under  exposure  to  repeated  load  cycles.  Additional  data  regarding 
the  initial  test  results  used  in  the  certification  procedure  from  JSC  and  from  the 
manufacturers  of  the  tries,  the  SIPs,  and  the  bond  are  needed  to  update  the  model. 
Therefore,  this  updating  should  be  based  not  only  on  statistical  data  on  tile 
performance  during  each  flight,  but  aiso  on  basic  information  about  the  components 
of  the  TPS. 

A complete  analysis  of  the  distribution  of  tile  capacities  will  require  additional 
data  from  maintenance  operations  including: 

° The  numbers  cf  tiles  replaced  so  far  on  each  orbiter; 

° A statistical  distribution  of  the  percentage  of  the  surface  of  the  tile/SIP 
system  that  was  found  to  be  actually  bonded  to  the  orbiteris  skin; 
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Source.  D.  Weber,  Lockheed  Corporation  (1989) 
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° Estimates  of  the  probability  of  failure  of  a tile  of  given  capacity  (e.g.,  10% 
bonded)  under  different  kinds  of  load  (e.g.,  debris  hit  >1"). 

A more  refined  partition  of  the  orbitehs  surface  can  be  obtained  using  data 
such  as: 

5 Effect  of  excessive  step  and  gap  on  the  heat  load  in  different  locations; 

c Possibility  of  partial  failure  of  the  guidance  system  or  control  surfaces  at 
re-entry  and  corresponding  increase  in  the  heat  load; 

° Trajectories  of  debris  from  the  ET  and  the  SRBs.  Computer  simulations 
done  at  JSC  (see  Figures  18  and  19)  could  give  better  information  about 
the  vulnerability  of  the  orbitefs  TPS,  in  particular  in  the  most  risk-critical 
areas; 

° Measuremems  of  temperatures  and  aerodynamic  forces  on  the  surface  of 
the  orbiter  (see  Figures  20  and  21 ); 

0 Effect  of  tile  loss  on  the  orbitefs  surface  temperature  in  the  cavity  (Figure 

22). 

The  analysis  itself  can  be  refined  in  several  ways.  A major  unknown  is  the 
performance  of  the  subsystems  under  the  orbitefs  skin  once  they  are  exposed  to 
excessive  heat  loads  due  to  TPS  failure.  The  only  alternative,  short  of  a systematic 
PRA  of  these  individual  systems,  is  to  use  subjective  estimates.  Finally , it  seems  that 
the  availability  of  a kit  for  in-orbit  repair  of  the  tiles  might  provide  a significant 
reliability  gain.  An  assessment  of  its  effectiveness  will  be  included  in  Phase  2 of  this 
study. 


Ascent  Debris  Trajectory  Simulation 
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Source:  R.  Gomez.  NASA  JSC  (198B' 
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Figure  20:  Thermal  measurements  (bottom  view) 
Source:  Structural  & Aerodynamic  Pressure  Measurement  Locations  JSC  17889 
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Figure  21 . Surface  measurements  (bottom  view) 
Source:  Structural  & Aerodynamic  Pressure  Measurement  Locations  JSC  17889 
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Section  4: 

ILLUSTRATION  OF  THE  MODEL 

The  illustration  of  the  model  presented  here  is  based  on  coarse  numbers 
whose  relative  values  are  more  significant  than  their  absolute  values.  By  overlaying 
the  functional  criticality,  burn-through,  debris  damage,  and  secondary  tile  loss  areas, 
33  min-zones  were  established.  Of  these,  21  are  unique  zones  (i.e.,  that  have 
different  sets  of  indices).  Several  zones  with  the  same  combinations  of  indices 
appear  on  different  locations  on  the  orbiter.  Figure  23  shows  the  final  layout  of  the 
min-zones  and  the  numerical  results  of  the  model.  Each  zone  is  assigned  an 
identification  number.  The  lower  numbers  are  generally  assigned  to  more  critical 
areas.  Each  zone  is  also  identified  by  an  index  number  whose  digits  relate  to  the 
four  area  types  shown  in  Table  7: 

1st  digit:  Bum-through  areas  (1  high,  2 medium,  3 low,  probabilities) 

2nd  digit:  Functional  criticality  areas  (1  high,  2 medium,  3 low,  criticality) 

3rd  digit:  Debris  damage  areas  <1  high,  2 medium,  3 low,  probabilities) 

4th  digit:  Secondary  tile  loss  areas  (1  high,  2 low,  probability) 


Table  7:  Structure  of  the  indices  of  the  min-zones  shown  in  Figure  22  and  Table  8. 

Table  8 lists  the  min-2ones,  and  shows  the  number  of  tiles  in  each  zone  and 
the  probability  of  failure  of  the  orbiter  attributable  to  this  zone.  This  value  was 
determined  by  calculating  this  probability  for  both  initiating  events  and  then  summing 
to  obtain  the  results.  The  boundaries  of  the  min-zanes  have  been  simplified:  the 
number  of  tiles  in  each  area  is  only  an  approximation  and  is  not  based  on  an  actual 
count.  The  location  description  is  only  intended  to  provide  a rough  placement  of  the 
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Figure  23:  Partition  of  the  011)116^8  surface  into  33  min-zones  (index:  i) 
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PflOV ) 1CH 


ID# 

Index 

Location 

siilaa 

Debris 

Debond 

Total 

1 

1111 

Right  side  , under  crew 

1 56 

0.87 

0.36 

1.23 

2 

1111 

Right  side  near  main  Ido  gear  (aft) 

1 56 

0.87 

0.36 

1.23 

3 

1121 

Right  side  near  main  Idg  gear  (fwd) 

<576 

0.13 

1.62 

1.75 

4 

1131 

Left  side  nii?r  main  Idg  gear 

780 

0.00 

1.87 

1.87 

5 

1211 

Centerlir;  under  crew 

364 

0.51 

0.22 

0.73 

6 

1311 

Left  side,  u idler  crew 

312 

0.11 

0.04 

0.15 

7 

1311 

Center  of  *:ght  eleven 

104 

0.04 

0.01 

0.05 

8 

1331 

Center  ol  !;>*■  eleven 

104 

0.00 

0.00 

0.00 

9 

2112 

Right  side  fnrd  mid  edge 

<524 

1.73 

0.75 

2.48 

10 

2121 

Center  of  body  flap 

208 

0.02 

0.24 

0.26 

1 1 

2131 

Left  wing , e enter 

488 

0.00 

0.56 

0.56 

12 

2311 

Right  side  mid  edge 

1664 

0.30 

0.13 

0.43 

13 

2311 

Left  side  , aid  edge 

1196 

0.21 

0.08 

0.29 

14 

2312 

Left  side,  mid  edge 

‘572 

0.10 

0.04 

0.14 

15 

2321 

Right  side  nose 

277 

0.01 

0.02 

0.03 

16 

2321 

Left  wing , center 

332 

0.01 

0.06 

0.07 

17 

2321 

Right  side  body  flap 

1 04 

0.00 

0.01 

0.01 

18 

2321 

Left  side,  body  flap 

104 

0.00 

0.01 

0.01 

19 

2321 

Right  wing 

2132 

0.18 

0.16 

0.34 

20 

2331 

Left  side  i < : 

312 

0.00 

0.02 

0.02 

21 

2331 

Left  wing,  'vd 

1768 

0.00 

0.13 

0.13 

22 

2332 

Right  elevur ,,  outboard 

312 

0.00 

0.02 

0.02 

23 

3112 

Right  wine,  center 

364 

0.01 

0.01 

0.02 

24 

3122 

Left  wine,  ranter 

468 

0.00 

0.01 

0.01 

25 

3122 

Center,  p 3; 'load  bay  fwd 

1 664 

0.00 

0.02 

0.02 

26 

3132 

Center,  p.-syioad  bay  aft 

1976 

0.00 

0.02 

0.02 

27 

3132 

Right  wif  j , center 

468 

0.00 

0.01 

0.01 

28 

3222 

Center,  p avtaad  bay,  mid 

520 

0.00 

0.00 

0.00 

' 29 

3312 

Right  ele*  1:  n,  in  board 

312 

0.00 

0.00 

0.00 

30 

3312 

Right  wir  1 center 

416 

0.00 

0.00 

0.00 

31 

3322 

Left  elevr:  n in  / center  body  flap 

728 

0.00 

0.00 

0.00 

✓ 

32 

3332 

Left  sieve r , outboard 

572 

0.00 

0.00 

0.00 

33 

3332 

Center,  ?A 

1040 

0.00 

0.00 

0.00 

Iatals 

5.09 

6.79 

11.88 

Table  8.  Identification  o!  "he  min-zones  and  their  contribution  to  the  probability  of  LOV 
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min-zone.  No  attempt  has  been  made  to  use  orbiter  notations.  The  final  numerical 
results  of  the  model  are  presented  in  the  right-hand  column  as  multiples  of  1 0*4.  The 
probability  values  are  mostly  in  the  order  of  1CH*.  Again,  it  is  important  to  remember 
that  the  importance  of  the  numbers  is  not  their  magnitude,  but  their  relative  values 
when  compared  to  each  other.  According  to  our  coarse  numerical  analysis,  the  total 
probability  of  losing  the  orbiter  on  any  given  mission,  due  to  TPS  failure,  is  in  the 
order  of  10*3.  It  is  interesting  to  note  that  approximately  40%  of  this  probability  is 
attributable  to  debris-related  problems  and  that  60%  comes  from  problems  of 
debonding  caused  by  other  factors.  By  scanning  the  columns,  it  appears  that  a few 
min-zones  contain  most  of  the  risk. 


Using  a risk-per-tiie  measure,  the  min-2ones  can  be  ordered  according  to 
their  criticality  with  respect  to  the  two  types  of  initiating  events,  and  to  the  total 
probability  of  failure.  The  results  are  shown  in  Tables  9 and  10.  Table  9 displays  the 
contribution  of  each  min-zone  and  of  each  tile  to  the  probability  of  LOV  separated 
into  debris  and  debonding  due  to  other  factors.  Table  10  shows  the  contribution  of 
each  tile  and  each  min-zone  to  the  overall  probability  of  LOV.  In  this  table,  we  show 
for  each  tile,  a risk-criticality  factor  that  is  proportional  to  the  relative  contribution  of 
this  tiie  to  the  overall  failure  probability,  accounting  not  only  for  the  loads  applied  to 
this  tiie  but  also  for  the  consequences  should  it  fail.  This  risk-criticatity  factor  is  the 
point  of  reference  that  will  be  used  in  the  second  phase  of  the  study  to  set  priorities 
among  different  management  measures  designed  to  improve  tile  reliability. 

A slightly  different  graphic  representation  of  this  table  is  displayed  in  Figures 
24,  25,  and  26.  It  is  possible  from  our  results  to  identify  the  most  sensitive  min-zones 
by  ranking  them  by  order  of  individual  tile  criticality.  One  can  then  plot  the  marginal 
increase  of  the  failure  probability  for  each  added  min-zone,  the  slope  of  each 
segment  representing  the  (decreasing)  contribution  of  each  tile  to  the  failure 
probability.  Each  black  dot  represents  the  addition  of  the  next  most  critical  min-zone. 
The  greater  the  horizontal  spacing  between  the  dots,  the  larger  the  number  of  tiles  in 


r 


75 


pate-Gomaif  ana  f-tschbeck 


C'S 

brie 

Debonding 

ID# 

r].-::one 

P(LOV)/tile 

ID# 

P{LOV)/2one  P(tOV)/tile 

cmi:: 

z -4 

O.OQE-fi 

0.00E-4 

O.OOE-8 

1 

0, 3 

70 

55.770 

4 

1 .870 

24.000 

2 

0. 3 

?0 

55.770 

3 

1 .820 

24.000 

9 

1 . 

30 

27.720 

1 

0.3  SO 

23.100 

5 

0.3 

10 

14.010 

2 

0.330 

23.100 

6 

0, 

i 0 

3.365 

9 

0.750 

12.000 

7 

0.0 

•10 

3.365 

1 1 

0.560 

12.000 

3 

0. 

SO 

1.923 

1 0 

0.240 

11.500 

12 

0..;! 

1)0 

1.785 

5 

0.218 

5.990 

13 

o.: 

i 0 

1.781 

6 

0.045 

1 .440 

14 

0. 

1)0 

1.748 

7 

0.015 

1.440 

10 

Cl 

:?0 

0.961 

15 

0.023 

0.829 

19 

0. 

its 

0.867 

1 2 

0.130 

0.781 

23 

0.1 1 

0 

0.274  ! 

1 S 

0.065 

0.781 

17 

o.n 

i)2 

0.192 

21 

0.133 

0.752 

18 

0,1* 

:)2 

0.192 

14 

0.043 

0.752 

15 

0.1 

:i3 

0.108 

20 

0.023 

0.737 

1 6 

0.!' 

:<8 

0.096 

22 

0.023 

0.737 

4 

0.: 

m 

0.000 

19 

0.156 

0.673 

8 

o.; 

:io 

0.000 

17 

0.007 

0.673 

1 1 

o.: 

m 

0.000 

1 8 

0.007 

0.669 

20 

o.; 

do 

0.000 

1 3 

0.080 

0.137 

21 

o.: 

DO 

0.000 

23 

0.005 

0.128 

22 

o.: 

DO 

O.GOG 

24 

0.006 

0.128 

24 

o.: 

00 

0.000 

27 

0.006 

0.121 

25 

o.: 

00 

0.000 

26 

0.024 

0.114 

26 

o.: 

DO 

0.000 

25 

0.019 

0.038 

27 

o.: 

DO 

0.000 

28 

0.002 

0.000 

28 

o.: 

DO 

0.000 

8 

0.000 

0.000 

29 

o.: 

DO 

0.000 

29 

0.000 

0.000 

30 

o.: 

; : 0 

0.000 

30 

0.000 

0.000 

31 

o.: 

do 

0.000 

31 

0.000 

0.000 

32 

o.: 

•0 

0.000 

32 

0.000 

0.000 

33 

o.: 

DO 

0.000 

33 

0.000 

0.000 

Table  9:  Probabilities  of  Loss  of  Vehicle  due  to  tile  failure  initiated 
(1)  by  debris  damage  and  (2)  debonding  caused  by  factors  other  than  debris, 
for  ear  i min-zone,  and  each  tile  ,'n  each  min-zone 
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ID  # 

P(LOV)/zone 

O.OOE-4 

P(LOV)/tiie 

O.OOE-8 

Risk 

Criticality 
0-100  scale 

Number  of 
Tiles 

Location 

1 

1.2300 

78.800 

100 

1 56 

rt  under  crew 

2 

1.2300 

78.800 

100 

156 

rt  main  gear  aft 

9 

2.4800 

39.700 

50 

524 

rt  fwd  mid  edge 

3 

1.7500 

25.900 

33 

576 

rt  main  gear 

4 

1.8700 

24.000 

30 

780 

It  main  gear 

5 

0.7280 

20.000 

25 

364 

center  crew 

! 1 0 

0.2600 

12.500 

1 6 

208 

body  flap  cen 

1 1 

0.5600 

12.000 

15 

468 

It/rt  wng  cen  out 

6 

0.1500 

4.810 

6 

312 

It  crew 

7 

0.0500 

4.810 

6 

104 

rt  eievon  cen 

1 2 

0.4270 

2.570 

3 

1664 

rt  srde  mid  edge 

1 4 

0.1430 

2.500 

3 

572 

tt  fwd  mid  edge 

1 3 

0.2930 

2.450 

3 

1196 

It  middle 

19 

0.3410 

1.600 

2 

2132 

rt  wing 

1 5 

0.0260 

0.938 

1 

277 

rt  nose 

1 6 

0.0730 

0.877 

1 

832 

It  wing  outboard 

1 7 

0.0090 

0.865 

1 

104 

body  flap  rt 

1 8 

0.0090 

0.865 

1 

104 

body  (lap  it 

21 

0.1330 

0.752 

1 

1768 

It  wing  forward 

20 

0.0230 

0.737 

1 

312 

It  nose 

22 

0.0230 

0.737 

1 

312 

rt  eleven  out 

23 

0.0150 

0.412 

i i 

364 

rt  wing  center  in 

24 

0.0060 

0.128 

<i 

468 

tt  wing  center  in 

27 

0.0060 

0.128 

<i 

468 

rt  wing  cen  out 

26 

0.0240 

0.121 

<i 

1 976 

center  bay  aft 

25 

0.0190 

0.114 

<i 

1664 

center  upper  bay 

28 

0.0020 

0.038 

<i 

520 

center  mid  bay 

8 

0.0000 

0.000 

<i 

1 04 

It  eievon  center 

29 

0.0000 

0.000 

<i 

312 

rt  eievon  in 

30 

0.0000 

0.000 

<i 

416 

rt  wing  cen 

31 

0.0000 

0.000 

<i 

728 

II  elev/body  flap 

32 

0.0000 

0.000 

<i 

572 

If  eievon  out 

33 

0.0000 

0.000 

<i 

1 040 

center  aft 

Table  10:  Risk-criticality  factor  for  each  tile  in  each  min-zone 
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the  zone.  Several  sma  rrin-zones  contain  a Ians  pan,  of  the  risk  (those  with  the 
steepest  slope),  whereas  several  very  large  min-znnes  carry  only  a small  part  of  the 
risk  (those  with  zero  sic  ::o).  Figure  23  shows  the  contribution  of  increasing 
percentages  of  the  tiies  to  risk  for  debris-initiate  1 damage.  Note  that,  for  failures 
initiated  by  debris,  80 % :x<  the  risk  is  due  to  only  8%  >:f  the  tiies.  For  debonding 
problems  that  are  not  cauc  3d  by  debris,  the  contribution  of  increasing  percentages  of 
tiles  are  shown  in  Figure  t!4:  80%  of  the  risk  is  duv  to  13%  of  the  tiles.  Finally,  the 
overall  result  is  shown  in  ' gyre  25:  for  the  total  rin';,  including  both  initiating  events, 
80%  of  the  risk  can  be  an  "'bated  to  14%  of  the  tiles  It  is  important  to  remember  that 
the  same  tiles  do  not  necessarily  appear  in  the  sa'  ie  order  in  each  graph.  Clearly, 
some  zones  pose  a much  igher  risk  for  one  type  of  Initiating  event  than  for  the  other. 
For  example,  min-zone  - located  near  the  left  main  gear  has  not  historically 
experienced  significant  i sbris  damage  and  is  n ot  on  the  obvious  trajectory  of 
tractable  debris;  so,  the  p c I:: ability  of  LOV  due  to  TPS  debris  damage  in  that  zone  is 
basically  zero.  There  arc , however,  some  critical  c omponents  that  are  temperature 
sensitive  under  the  skin  n that  area;  so,  the  risk  t-1  LCV  due  to  debonding  is  non 
negligible  (1.07  x 10*4). 
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Section  5: 

EFFECTS  OF  ORGANIZATIONAL  FACTORS  ON  TPS  RELIABILITY: 

MAIN  PRELIMINARY  OBSERVATIONS 

£1  Errors  and  risk 

Weil-bonded  tiles  are  very  unlikely  to  debond  even  under  moderate  debris 
loads.  Given  the  temperature  gradients  measured  inside  the  tiles  during  flights,  it 
has  been  determined  that  the  tiles  absorb  most  of  the  heat  within  a fraction  of  their 
thickness  and  that  they  are  very  unlikely  to  burn,  even  considering  a wide  range  of 
re-entry  scenarios.  If  the  tiles  are  to  fail,  it  is  likely  to  be  because  they  have  been 
weakened  and/or  hit  by  debris.  The  problem  is  that  one  does  not  know  which  ones 
are  weak.  Human  errors  (past  and  present)  are  at  the  source  of  at  least  three  of  the 
fundamental  causes  of  tile  failure:  (1)  decrease  of  tile  capacity  because  of 
undetected  partial  or  weakened  bonding,  (2)  increase  in  the  heat  loads  due  to 
roughness  of  the  arbiter's  surface  (caused,  for  example,  by  protruding  gap  fillers), 
and  (3)  poorly-installed  and  maintained  insulation  on  the  SRB's  and  ET  that  flakes 
off  during  ascent,  damaging  the  TPS.  These  human  errors  are  often  the  consequen- 
ces of  the  way  the  organizations  (NASA  and  its  contractors)  operate. 

In  the  second  phase  of  this  work,  we  will  explore  to  what  extent 
organizational  procedures  (for  instance,  those  that  induce  time  pressure  and 
turnover  of  the  personnel)  are  at  the  root  of  these  incidents.  Rules  that  apply 
uniformly  across  tiles  of  widely  variable  risk-criticality,  and  rules  that  do  not  account 
for  the  possibility  of  system  weakening  over  time  may  become  major  contributors  to 
the  overall  risk.  Furthermore,  the  scope  of  the  research  cannot  be  strictly  limited  to 
the  TPS.  Procedures  and  management  decisions  regarding  the  maintenance  of  the 
insulation  of  the  ET  and  the  SRBs  also  affect  the  reliability  of  the  tiles  since  they  are  a 
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source  of  debris.  Finally,  in  the  long  term,  weakening  of  the  tile  system  due  to 

repeated  load  cycles,  exposure  to  environmental  conditions  on  the  ground,  or  XJ? 

chemical  reversion,  may  be  enme  a dominant  factor  of  the  failure  risk.  The  problem 

of  deterioration  over  time  r?iy  not  be  (and  is  not  like  y to  be)  of  immediate  concern 

for  well-bonded  tiles,  but  m;-  y become  a critical  factor  or  those  tiles  whose  capacities 

have  been  reduced  by  d€h  olive  installation  and  maintenance.  Therefore,  in  the 

second  phase,  we  will  exarni  closely  the  procedure ; of  the  organization,  using  our 

PRA  model  to  see  how  the  -:;;lative  contributions  of  e;:-  ch  of  these  factors  affect  flight 

safety. 


In  addition,  the  stnurnre  of  the  organization  find  its  peripherals  (NASA,  plus 
Lockheed,  Rockwell  etc.)  r'icI  the  rules  that  determ:ne  the  relations  among  these 
organizations  (for  example  i n setting  contracts,  pay  crates,  and  incentives,  as  well 
as  schedule  and  budget  ct:  “straints,)  may  aiso  after:  flight  safety  to  the  extent  that 
they  determine  the  occurrei  :e  and  severity  of  human  errors  and  their  probabilities  of 
detection.  Some  organizational  improvements  (which  r iay  have  been  recommended 
before  and  ignored  for  vr  'loirs  reasons)  may  have  only  a minor  effect  on  the 
reliability  of  the  orbiter;  otters  may  be  essential  sop  i.  Our  analytical  model  will  be 
used  to  determine  which  of  "base  factors  actually  affect  the  probability  of  failure  of  the 
tiles  (and  consequently,  o?  the  orbiter)  and  by  how  much.  Finally,  the  culture  of  the 
organization  may  aiso  plei'  a role.  As  we  describe  hetow,  the  low  status  of  the  tile 
work  may  induce  low  m<:  rile  among  some  tils  technicians.  Furthermore,  the 
behaviors  of  other  workers  towards  the  tile  technician  s may  be  a significant  source  of 
additional  work  load  and  tin*  pressure. 

Errors  (most  of  whi:  I can  be  traced  back  to  these  organizational  factors)  can 
be  classified  using  a taxermy  which  has  been  designed  to  guide  the  choice  of 
management  improvemeni ? (Fate-Cornell.  1990.)  Frrors  are  categorized  into  two 
groups:  gross  errors  (uncorrpversial  mistakes,  for  example,  an  unbonded  tile)  and 
errors  of  judgment  unde-  uncertainty  (for  instance,  the  decision  to  live  with  a 


I 
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problem  that  seems  minor  -but  may  not  be  so-  until  the  next  flight  in  order  to 
decrease  the  work  load.)  Gross  errors  generally  call  for  improvements  of  the  hiring 
and  training  procedures,  inspection  and  quality  control,  and  information  flow;  errors 
of  judgment  generally  require  modification  of  incentives  and  rewards,  improvement 
in  the  treatment  and  communication  of  uncertainties,  and  adaptation  of  the  resource 
constraints. 

5.2  Preliminary  observation? 

In  this  preliminary  phase,  we  identified  the  following  factors  as  possibly 
affecting  the  efficiency  of  tile  risk  management:  (1)  time  pressures,  (2)  liability 
concerns  and  conflicts  among  contractors,  (3)  turnover  among  tile  technicians  and 
Idw  status  of  tile  work,  (4)  need  for  more  random  testing,  and  (5)  contribution  of  the 
management  of  the  ET  and  the  SRBs  to  TPS  reliability  problems.  The  study  of  these 
factors  will  be  the  object  of  the  Phase  2 of  this  work.  The  foundation  of  this  analysis 
will  be  the  risk-criticality  of  each  tile  so  that  limited  resources  -for  example,  the 
limited  number  of  tile  inspectors - can  be  directed  first  where  the  probability  and  the 
consequences  of  tile  failure  could  be  most  severe. 

5.2.1  Time  pressures 

Tile  maintenance  is  often  on  the  critical  path  to  the  next  flight,  specially  after 
missions  where  tile  damage  has  been  extensive.  People  who  find  themselves  under 
time  pressures  sometimes  cut  comers.  For  example,  it  was  found  in  January  1989. 
that  a tile  technician  had  added  water  to  the  RTV  mix  in  order  to  make  it  cure  faster. 
Adding  water  at  that  stage  {or  spitting  In  the  RTV)  may  decrease  the  long-term 
reliability  of  the  bond:  the  catalytic  reaction,  which  occurs  during  the  curing,  may 
reverse  earlier  and  thus  increases  the  probability  of  debonding  under  different  types 
of  loads.  Time  pressure  is  also  probably  the  cause  of  more  frequent  errors,  such  as 
the  misalignment  of  the  tiie/SIP  system  with  the  filler  bar,  so  that  only  a fraction  of  the 
surface  of  the  SIP  is  in  contact  with  the  orbiteris  surface.  Time  pressures  may  be 
unavoidable,  but  some  organizational  improvements  may  attenuate  their  effects, 
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first,  by  reducing  them  vihsnsver  possible  and  sexrtd.  by  increasing  tiie  quality 
control  in  the  most  risk-cri':  cai  .zones. 


The  time  pressure  icier  which  the  tiie  persa  mel  operates  can  be  reduced  in 
several  ways.  First,  auto i ration  of  step  and  gap  measurement  (using  laser  devices 
and  automatic  data  recording  systems  currently  under  development)  may  resuit  not 
only  in  a significant  reduce' m of  the  processing  tim  •*,  but  also  in  a decrease  of  the 
roughness  of  the  orbiterh;  surface.  Second,  simplifying  the  paper  work  for  the  tile 
technicians  would  allow  them  to  spend  more  time  virxing  on  the  tiles  and  less  time 
shuffling  papers  (an  appius-m  source  of  frustratior ).  Third,  it  seems  desirable  to 
avoid  over  monitoring.  For  example,  imposing  daily  targets  (as  opposed  to  weekly 
ones)  for  the  number  of  Hu  s to  be  processed  may  decrease  the  variability  and  the 
flexibility  needed  for  optional  performance  and  system  reliability.  Fourth,  time 
pressure  may  be  alleviiried  by  reducing  the  ac'jass  time  to  data  bases  and 
information  that  is  necessity  for  prompt  maintenance » decisions.  The  maintenance  at 
KSC  is  done  by  Lockheed,  while  some  of  the  relev;  nt  data  bases  are  controlled  by 
Rockwell.  NASA  may  w;  1 1 to  improve  the  transfer  of  information  from  one  to  the 
other  and/or  within  these  h o organizations. 


5.2.2  Liability  con  : urns  and  conflicts  amonc  contractors 
Relatively  harmor  mis  relations  have  been  instituted  among  the  people  who 
work  on  the  tiles.  They  shi.ro  a common  concern  fa  the  safety  of  the  system  despite 
obvious  sources  of  conflict ;.  Rockwell  and  Lockhi?  acl  are  in  a competitive  situation 
which  does  not  always  ptovide  incentives  to  make  1 3 other's  work  easier.  Among 
other  factors,  the  liability « of  the  main  contractors  are  such  that  they  occasionally 
have  incentives  to  withheld  technical  information  (fcr  legal  and  contractual  reasons) 
that  may  be  useful  (if  not  i;  sential)  for  the  performance  of  the  other.  These  decisions 
may  be  justified  given  tl-  s ways  the  contracts  have  been  set.  There  are  ways  of 
writing  and  handling  contracts  that  improve  incentives  for  cooperation  and 
encourage  the  sharing  o ' ’elevant  technical  information.  This  implies  that  contracts 
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that  affect  the  same  subsystems  (e.g.,  the  tiles)  and  are  signed  with  different  firms 
cannot  be  managed  independently.  The  positive  side  of  this  competition  among 
contractors  is  that  there  are  no  incentives  for  complacency  and  strong  motivations  to 
detect  and  correct  errors  made  by  the  other.  There  are,  however,  strong  incentives  to 
hide  those  made  by  one's  own  company. 

5.2.3  Turnover  amona  tile  technicians  and  low  status  of  tile  work 

The  turnover  among  the  tile  maintenance  personnel  is  high.  Because  tile 
technicians  are  classified  in  the  low-pay  category  of  material  (fiberglass)  technicians 
(a  practice  that  NASA  apparently  inherited  from  the  DoD),  many  of  them  leave  their 
tile  maintenance  jobs  shortly  after  completing  the  training  program  and  obtaining 
certification.  Organization  experts  generally  believe  that  high  turnover  is 
incompatible  with  learning  (individual  and  organizational)  and  optimal  performance. 
Therefore,  this  turnover  might  affect  TPS  safety  due  to  inferior  quality  work  by  less 
experienced  people.  Protruding  gap  fillers,  for  example,  are  caused  by  poor  quality 
installation  and  are  a probable  cause  of  early  boundary  layer  transition  (Smith, 
1989.)  This  condition  may  not,  in  itself,  threaten  flight  safety  unless  it  is  coupled  with 
other  factors.  It  does  decrease  the  overall  TPS  reliability  and  may  be  an  adverse 
result  of  high  turnover  and  the  corresponding  lack  of  experience  of  the  work  force. 
On  the  other  hand,  according  to  some  of  the  technicians,  the  old-timers  may  not  be 
as  respectful  of  "the  book"  as  the  newcomers.  Assessment  of  the  net  result  of 
inexperience  and  complacency  requires  a study  of  the  coupling  between  time  on  the 
job  and  occurrences  of  errors. 

The  low-paying  job  factor  may  have  other  indirect,  negative  effects  on  the 
reliability  of  the  tiles.  Because  of  the  tow  consideration  that  other  categories  of 
technicians  seem  to  have  for  tiie  work  when  doing  other  types  of  technical  work  on 
the  orbiter  (e.g.,  mechanical,  or  electrical)  other  workers  do  not  pay  sufficient 
attention  to  the  integrity  of  the  tiles.  They  damage  tiles  frequently  (if  not  seriously) 
thus  adding  considerably  to  the  tiie  maintenance  work.  Therefore,  the  low  status  of 
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the  tile  workers,  grounded  in ' r?  pay  scale,  may  have  several  detrimental  effects:  (1) 
a waste  of  money  in  training  t:,t;  technicians  that  leave  tlre  iob  as  quickly  as  possible, 

(2)  low  morale  for  some  of  thcrv , which  is  seldom  conducive  to  high-quality  work,  and 

(3)  the  "no  respect"  syndrci  3 on  the  part  of  other  technicians  who  carelessly 
damage  tiles.  The  result  is  an  ’crease  of  time  pressure  for  a system  that  is  already 
"the  long  pole"  a iarge  part  i:f  the  time,  in  the  end,  tf  ese  factors  may  encourage 
detrimental  corner-cutting  in  tiu;  processing. 

5.2,4  Need  tar.,  more  n\ rlaTJ^stiGsu 

The  original  tile  work  and  subsequent  maintenance  work  has  not  aiways 
been  perfect.  Some  of  the  •i*«s  have  been  only  par’ially  bonded  and,  in  a few 
instances,  not  glued  at  all.  F:  r example,  in  November  1 939,  it  was  found  in  that  one 
tile  on  orbiter  Columbia  had  been  holding  for  several  flights  by  the  friction  of  (or 
perhaps  some  RTV  adherent  ::)  the  gap  fillers.  The  fact  that  this  tile  held  and  did  not 
cause  an  accident  was  cali-!:  “a  miracle"  by  the  pe  sour  el  who  discovered  the 
problem.  How  "miraculous"  c ; v be  determined  using  the.  risk  assessment  model.  (In 
fact,  according  to  our  estimaf!  the  probability  of  debonrling  is  1 Q*2  per  flight  for  such 
a tile,  making  the  probability  r:f  debonding  in  five  flights  n the  order  of  5%.)  Because 
of  these  hidden  weakness  as.  it  may  be  desirable  to  do  more  random, 
non-destructive  pull  tests  of  the  black  tiles  between  flights,  focusing  on  the  most 
risk-critical  areas  of  the  orbit  : r?;  surface  in  order  to  detint  and  replace  the  tiles  that 
are  far  below  the  expected  cai::  idty. 

In  addition  to  the  poss  tiitity  that  previous  work  may  ncit  have  been  perfect,  the 
possibility  of  long-term  deterioration  of  the  room-temper  ature  vulcanized  (RTV)  bond 
should  be  acknowledged  anr:  ;=ken  into  account  in  maintenance  procedures.  This 
caiis  (1)  for  additional  randon  tasting  to  monitor  the  possible  chemical  degradation 
of  the  RTV  after  repeated  rest-load  cycles,  and  (e’i  for  the  development  and 
implementation  cl  non-destn'~*ive  and,  if  possible,  non- lull  testing  of  the  tiles'  bond, 
to  be  applied  in  priority  to  the  cst  risk-critical  tiles. 
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5.2.5  Contribution  of  the  manaaement  of  the  ET  and  the  SRBs  tg  JPn 

reliability; 

A significant  fraction  of  the  risk  of  TPS  failure  is  due  to  debris,  in  particular, 
pieces  of  insulation  from  the  external  tank  and  the  nose  cone  of  the  solid  rocket 
boosters.  In  addition,  tiles  are  much  more  iikely  to  debond  under  the  shock  of 
chunks  of  debris  when  they  are  already  loose  or  less  than  completely  bonded.  By 
backtracking  the  computer-simulated  trajectories  of  pieces  of  debris  from  the  most 
risk-critical  parts  of  the  orbiter  surface  back  to  the  corresponding  parts  of  the  surface 
of  the  ET  and  the  SRBs.  it  may  be  possible  to  identify  which  parts  of  the  surface  of  the 
ET  and  the  SRBs  shouid  be  given  special  attention  in  the  treatment  of  the  insulation. 
Additional  testing  should,  therefore,  be  performed  for  tiles  located  in  zones  that  are 
most  likely  to  be  hit  by  SRB  and  ET  insulation  debris. 

For  each  of  these  organizational  factors,  the  analytical  procedure  is  to  identify 
the  decisions  that  they  affect,  the  errors  that  they  can  cause,  the  frequency  with  which 
they  occur,  the  nature  and  the  severity  of  the  resulting  errors  as  a function  of  the 
severity  of  the  conditions,  and  their  effect  on  the  probability  of  failure  of  the  system 
using  our  PRA  model.  The  efficiency  of  possible  management  improvements  can 
then  be  roughly  assessed  so  that  efforts  are  concentrated  where  they  can  provide 
the  greatest  benefits.  This  assessment  will  be  the  objective  of  the  second  phase  of 
this  study. 
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Section  6: 

CONCLUSIONS 

The  results  ot  our  idol's  illustration  suggest  that  the  probability  of  loss  of  an 
orbiter  due  to  failure  of  black  tiles  is  in  the  orcHr  of  10’3  with  about  15%  of  the 
tiles  accounting  for  aboul  f!0%  of  the  risk,  if  one  accepts*  the  rough  NASA  estimates 
that  the  probability  of  losin:?  an  orbiter  is  in  the  orde  of  ID*2  per  flight  (Broad,  1989) 
and  that  a significant  par'  i:  f it  is  attributable  to  the  Main  engines,  then  the  proportion 
of  the  risk  attributable  to  ’ * 3 TPS  (about  10%)  is  no-  alarming,  but  certainly  cannot  to 
be  dismissed.  (Our  prc : abilities  are  coarse  numbers  that  can  be  refined  in  the 
second  phase  of  the  wor- , but  they  are  probably  in  the  tall  park.)  A critical  issue  is: 
how  will  these  probabiiiti  ; i;  evolve  in  the  years  to  csme?  On  one  hand,  the  quality 
of  the  tile  work  and  the  detection  mechanisms  fo  defective  tiles  are  expected  to 
improve.  On  the  other  hand,  exposure  to  repeated  load  cycles  and  environmental 
conditions  or  chemical  r;  ii  ction  may  deteriorate  the  system's  peformance  capacity 
unless  cioseiy  managed. 

One  of  our  key  'ridings  is  that  the  most  risk-critical  tiles  are  not  all  in  the 
hottest  areas  of  the  orbit surface.  We  introd;  cad.  in  this  study,  the  notion  of 
risk-criticality  and  the  com : station  of  a risk-criticaiit) ' index  to  account  for  the  loads  to 
which  the  tiles  are  sub;  aoted  and  the  consequences  of  their  failures  given  their 
location  with  respect  tc:  other  critical  subsystems  which  they  protect  {functional 
criticality).  This  index  inn  serve  as  a guide  to  set  management  priorities,  for 
example,  for  the  gradual  replacement  of  the  tiles,  focusing  first  where  tile  failure 
could  be  most  damaging 

Well-designed,  manufactured,  bonded,  and  maintained  tiles  are  extremely 
unlikely  to  fail.  A large  motion  of  the  risk  seems  to  be-  attributable  to  tiles  that  are 
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only  partially  bonded,  or  to  those  that  are  not  bonded  at  all  and  are  held  in  place  by 
the  gap  fillers.  Management  assumes  unnecessary  risk  by  denying  that  errors  have 
occurred  and  will  occur  again  and  that,  consequently,  the  capacity  of  the  TPS  is 
reduced.  To  assume  that  ail  work  is  perfect  leads  to  a potentially  gross 
underestimation  of  the  risk,  rendering  the  maintenance  procedures  based  on  this 
assumption  of  perfection  suboptimal.  What  the  actual  magnitude  of  this  part  of  the 
risk  is  and  which  organizational  improvements  can  bring  the  greatest  risk-reduction 
benefits  will  be  studied  further  in  the  second  phase  of  this  study.  This  part  will 
involve  a systematic  analysis  of  the  maintenance  process  to  identify  the  different 
types  of  errors  (past  and  present),  their  rates  of  occurrences,  their  probabilities  of 
detection  and  correction,  and  their  severity  levels  (i.e.t  by  how  much  they  decrease 
the  system’s  capacity  in  each  case).  Relating  these  errors  to  the  organizational 
factors  described  in  the  previous  section  will  allow  us  to  identify  management 
improvements,  their  costs,  and  their  expected  positive  effects  on  the  TPS 
pertormanc8. 


After  the  completion  of  the  first  of  two  phases  of  research,  our  preliminary 
conclusions  are  that  it  is  desirable:  (1)  to  expand  the  current  concept  of  criticality  for 
the  tiles  (to  include  functional  criticality,  as  well  as  the  heat  loads  in  a risk-criticaiity 
measure),  (2)  to  adapt  the  inspection  and  maintenance  procedures  to  focus  in 
priority  on  the  most  risk-critical  tiles,  and  (3)  to  modify  the  existing  data  bases  to 
include  the  risk-criticaiity  factor  for  each  tile. 
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ABSTRACT 

This  paper  describes  a prooatBlisticmati'i  *i  which  «*t«nd» 
classical  PRA  to  include  soma  cnaraaenstica  erf  r -a  ciganaation 
that  processes  or  manages an engineering systo 1 ; AiMonomyof 
.errors  t$  presented  andtheirofgani*ationaiit)are:!i!  exsitMnod.  An 
'assembly  mooet  <*  proposed  for  in#  analyst*!  f ' tn«  resulting 
spectrum  of  capacities  of  the  system.  Tha  m.;  mi  jnnent  ot  tha 
Thwmal  P rot  set  on  System  of  tns  Soaee  Shur  * is;  wtod  as  an 
•ration.  Th«  modal  allows  assessment  o'  'tas  benefits  of 
mzabonai  improvements  ot  me  otbiter's  pep:  m -.wig. 

PROCESS  ANALYSIS  IN  REUABIUTY  MODE. . 

The  quanutative  analysis  of  the  reliability  : ! ft  wgtneervig 
system  such  as  a nuclear  power  plant  or  tfia  sp?i  :m  status  mows 
identification  of  as  different  failure  modas  and  obj t a r moan  of  thsir 
crobabaatM.  Therefore,  l pt.iv.u  a decision  ri  tuo'  >9  cheoso 
•cnracai  solutions  that  maximize  an  obrecuva  t.  1 : :tjrt  (including 
re  liability)  under  resource  oor»u«vtts.  This  meant  »:  r tminee.  tm 
cnoiea  ol  design  charaoaristiea  mat  rrrumdt  tin?  c reliability  ot 
aikira  during  ins  lifetime  of  the  systam  unosr  ooti  ji  amm  of  oosa, 
ime,  ana  performance. 

Technical  modification*,  howwver.  repress  x yMyoneettis 
n nsic  management  strategies.  When  a systam'n  j 1 just i»  studied 
ipoattrion,  sis  often  poeted  out  ttwvrnatresui:  n ii  mteehrecai 
aiiura  was  aoualy  footed  m a sfraM.avM  or  luncri:  "i!  hMtureoftho 
organization.  This  was  the  case,  tor  example,  of  tt  i:  xnMont  at  tno 
•pace  enuttte  Challenger  where  a numoerofergni  inumiutaettni 
ontneuted  to  NASA'*  decision  10  taunett  unerr  triuoseotaoio 
imewaiure  00  newont .«  These  orgfcwiAional  tii  i w \ « iriUide.  (or 
example,  dispersion  (thus.  someumei  1 1:  st  u'  tajmnaif** 

cattom).  timaobrmirftia.  end  pressures  «t  pubic  c itaons.  Modi- 
cattons  and  improvements  ct  meoryarezarion  Sri  •* ; nay  address 
om#  ot  the  reka&aty  protxema  at  a mere  tunaac  j m level  than 
strengthening  the  engineering  design  alone.  * Si  1 ; rroaiHcatione 
inckida.  tor  axampia.  impiovmg  oommuncatton*  •»  i attactrec 
'anting  systems,  and  enaunng  consistency  of  tun  ttvttt  across 
te  organization.  * 

The  obieaotmts  paper  is  to  dtocussaquanntitfuir  approach 
analysts  of  the  attaett  of  oigansational  fa  t:  ;!  it:;  cm  system 
ikabdty.  The  pnneote  it  to  compute  the  protrebr;  l h st  mi  vrense 
J the  bate  evens  in  greater  depth  man  s Is  pi  •v-rany  don*  « 


classical  PRa  1 y linking  this  probability  fo  the  industrial  process 
itsoB.  • The  m vne -a  imttives  exptieft  assessment  of  tne  eft  eel  of 
managonsit  pro  atiures  on  the  probabtStty  ot  tecnnical  failures  ana. 
therefore.  slkv'i : r ’(tension  of  me  value  of  information  ol  conven- 
tional PRA.  By  st;  3sac*;t  explicitly  me  reliability  benafils  of  organ- 
izational i it*) to-  imfims  mong  wan  technical  ones,  the  results  a#ow 
sathng  onorftic  ? *****  wdety  measures  mat  go  beyond  technical 
mocMcation*  r’  me. 

TheNai  msii AeienauflcssndSpaceAdmfetistration(NASA) 
presents  aomn  ungstrtaisowu  features  that  influence  its  mode  of 
operaODna  anti  mi*  thn  mSabiWy  of  its  spaoa  systems.  NASA  is  a 
high-visibility  c lanizaten , uncertain  Ka  future  funding  and. 

therefore. daptri  dnrconciubUerefatiens.Risaiaefragmenfedintwo 
ways:  geographically  among  space  centeri.  and  ooeratbnaUy 
among  spue*  tv  grains.  in  tha  earty  I960'i,  NASA  daeided  against 
probabilistic  rtsci  pneJyHa.  thus  avoiding  me  issue  ot  mew  sale  is 
safe  enough*  i i »/fiat  rt  genaraay  racognaed  as  a high-risk 
operatton.  Yet.  ^wwinj  the  Challenger  eoooert  m January  1986 
and  taoeciwiai  3 ’eng  list  cf  OPteratM  correct  tone.  NASA  is  begmreng 
to  oomptsmerK  a ihuaiBsSMe  metnodsef  identtficahon  of  me  fatare 
modaa  Oy  quanh>  ng  fH'sbatxPties  and  dependences  as  recom- 
mendso  by  tnc  Uny  Ccmmtssion.*A  current  obtecPva  is  dearly  to 
increase  the  e<f  - ct  J v«m«a  of  me  organisation  and  the  efficiency  ol 
resource  snotr  -91:  by  swumg  pnonpes  among  tha  teenneaf  eokt- 
tione  10  extsttv!  eoeienst.  Yet.  xz  the  Rogers  Commission  Domed 
out.'  8 is  dear  n « some  of  NASA's  rekathtty  ^.u^am  amt  oe 
resowed  by  <»t:  ■ ign  mcoiAaittont  eione  because  their  roots  aw 
organizational.  ”h  ? 1r«gri*remion  of  the  organization,  the  aeoarent 
buttering  betwr-  n wgirwere  and  manegare  and  me  dnaigence  of 
their  nsx  petct  r'-Bitt,’  liihcufees  of  learning  given  tns  scarcey  of 
usable  trend  rn  xitua.  ill  then  lectors  have  constbuted  to  me 
vutnerabt%  of  n acran  ay  stems  op#i«i»iia.  Then  effects,  however, 
vary  among  thn  dimiwiv  subaystama  according  to  their  onysiea) 
and  functional  r'^croaenstleB  and  to  me  features  of  tha  managing 
organizations. 

The  Thr'.  Tni  Prfrtijcwn  System  (TPS)  of  the  specs  shuttle 
provides  an  oxs-v^t#  ts  Kto  coupling  between  iac.v~csi  and  organ- 
izational probtr  *.  tt  is  s oamprex  system  that  la  deigned,  manu- 
factured. otoeerr  ;ns:l.  ana  maintained  by  sevaral  organ^^ns.  tt  is 
mad*  of  Mack  rr*  'utrtci  ties  (about  24.000  on  mo  eroiter  Oisoov- 
ery),  reinfotesr’  laracn-onrbon  ei the  hottest  zones,  thermal  blan- 
kets in  coxier  rcrew.  nr  ti  hexfete  knaatton.  The  tun  tnemsefvea 
are  snatched  by  : cpixani  bom  (RTV1 10  a tlexttM*  pad  desgneoto 
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aosoft)  me  bending  of  the  o toilers  surface.  The  paas  are  bonded  to 
tne  aluminum  skin  (itself  covered  wen  a pnmen  by  me  same  RTV 
The  TPS  can  fail  in  three  ways:  debonang,  bum -through,  and 
carnage  by  impacts,  tt  is  subjected  to  a aet  of  external  loads,  some 
of  them  mostly  predictable  (like  vibrations  ana  heat  under  normal 
operating  conditions),  some  of  mem  more  random  1*0  debris, 
important  features  of  the  PRA  model  lor  the  tiles  art  the  potential 
failure  dependencies  from  tile  to  tile.  and  the  coupling  between 
failure  of  the  TPS  anc  failure  of  the  subsystems  located  directly 
under  the  aluminum  skin  of  the  orbiter. 

The  management  of  the  TPS  presents  marry  characteristics 
that  are  typical  ol  the  linkage  between  organizations  and  reliability, 
it  involves  several  organizations  arxJeont**ctrrs  in  different  pieces 
(inducing  Rockwell.  Lockheed,  and  NASA,  at  Kennedy  Space 
Center  and  at  Johnson  Space  Center)  and  procedures  that  were 
mostly  developed  for  me  initial  shuttle  construction  and  not  for  a 
long  1 arm  maintenance  program.  The  TPS  inspection  and  mainte- 
nance oroceduret  are  extremely  tabor  mtenarve  ana  lime  consum- 
ing. andere  often  on  the  cm  ical  path  to  me  next  launch.  Thetraimng. 
dedication,  and  motivation  of  me  personnel  mvotved  in  this  process 
is  cmicano  me  retiatuiity  of  me  system.  Tha  ourrem  procedure  reives 
mostly  on  mantenance  on  demand.  Afthougn  destructive  putt  tests 
are  performed  tor  a small  sample  of  tiles,  in  most  places,  the 
problems  posed  py  the  aging  of  me  bonding  are  not  addressed 
directly.  The  recording  of  operations  involves  a mass  of  paper 
documents.  Furthermore,  ine  procedure  involves  some  pnorfttza- 
among  the  TPS  elements  cased  on  qu*u«*ive  judgments,  but 
g rstematc  pnonties  based  on  a quantitative  assessment  of  the 
of  failure  due  10  tiles*  location  with  respect  to  other  critical 
/stems. 

A new  method  to  automatize  the  Impaction  of  the  tiles  ts 
currently  being  implemented/  An  important  aspect  of  this  method 
is  in at  a greasy  simplifies  the  currant  tasks  of  observing,  comrmrv* 
eating,  storing,  and  removing  mlormstion  concerning  me  current 
state  of  the  tiles  and  me  it  oast  performance,  a should,  therefore, 
increase  the  reliability  of  the  mspactbn  ana  mantenance  opera- 
tions. By  accelerating  the  process,  automation  may  also,  in  many 
instances,  take  the  tiles  oft  the  emeu  pain  to  me  next  lauram.  Tha 
gamin  shuttle  reliability  between  manual  inspection  and  automation 
is  a function  n)  of  Ihe  initial  contribution  of  me  TPS  to  the  overall 
failure  risk  and  (2)  of  the  gams  meat  in  TPS  ratototity.  One  speahe 
issue  that  can  be  addressed  by  the  extenson  of  PRA  desenbed 

here  is  the  benefit  of  accounting  tor  me  relative  crticaMy  of  the  tVet 

in  different  locations  on  the  orbit e^s  surtaoe  in  tha  management  01 
the  TPS  This  may  result  to  incraismg  maintenance  eeorts  to  key 
areas  such  as  the  surface  covenng  the  hyorauic  command  system, 
afso.  perhaps,  special  monitoring  of  the  insteisttoA  operations 
for  these  most  critical  areas.  Another  issue  am  can  be  addressed 
by  extension  or  PR  A as  described  here  is  me  motive  ermonance  of 
the  management  of  the  TPS  itself  and  of  the  management  of  other 
systems  mat  are  sources  01  debris  (e.g..  me  external  tank  insula- 
tion) in  the  overall  reliability  ol  the  thermal  protection  function. 

INTEGRATION  MODEL 

Probabilistic  risk  analysis  (PRA)  for  engmeering  systems 
allows  identification  ol  their  weakest  pans  through  quantification  of 
the  probabilities  of  the  Pilferers  failure  modes  isee.  ter  txaiTvie 
4 WidKumamoto)  .•  Extensonof  me  PRA  model  permits  more 

consideration  of  ma  pr  orgsr«u«atona*  characteristics*  (strue- 
procedures,  and  culture**)  that  affect  the  retabtirty  of  opera- 
tions, specialty  in  situations  of  distributed  dec non  making  r The 


method  extends  the  scope  ol  PRA  through  a Bayesian  analysis  of 
me  sequence  el  tasks  10  oe  peitormeo  in  the  process  of  design, 
manutacturmg,  inspected  maintenance,  arc  operations,  and  the 
computation  ot  tha  probabilities  of  technical  as  wen  as  organiza- 
tional failures  mat  can  affect  the  system's  reliability.  The  reasoning 
involves  analyse  ana  extension  of  errors  to  memos  not  only  the 
classes!  operators  errors  but  also  errors  that  are  due  to  the 
procedures  and  structure  ot  tna  organization.  An  essential  distinc- 
tion is  mad#  here  between  gross  errors  and  errors  of  judgment 
because  remedial  actions  10  address  these  two  types  of  pro&tems 
may  be  ol  different  nature,1* 

The  first  phase  is  an  anatysa  of  the  process**  (e  g.,  engi- 
neenng.  maintenance,  and  operation)  in  order  to  identify  what 
constitutes  normal  performance"  and  potential  problems  with  their 
probabilities  or  base  rates  per  time  unit  or  per  operation,  which 
depend,  among  other  factors,  on  the  organization  s culture  and 
mcerekra  structure.  Given  mat  a basic  error  occurs,  the  next  phase 
is  an  analysis  of  the  organizational  procedures  and  incentive 
syettm  10  determine  the  probability  that  it  isaoserved.  recognized, 
carrmjrucatec.  and  corrected  in  time  (i.e.>  before  it  causes  a 
system  failure).  The  results  ot  these  two  phases  s a computation  of 
ihe  probabiktie*  of  the  dfflerenr  system*!  states  corresponding  to 
possible  types  of  structural  detect s and,  therefore,  to  different  levels 
of  system's  capacity.  The  third  phase  is  a probabilistic  risk  analysis 
of  the  physical  system  that  allows  computation  ot  the  overall  failure 
prooatotfy  (1)  under  normal  circumstances,  and  (2)  given  potential 
weaknesses  ot  the  different  elements  and  increase  of  their  failure 
probabilities.  Thee#  three  models  (process,  organization,  and  PRA 
lor  dfiftem*  levels  ot  system's  capacity)  are  integrated  using  an 
event  tree  (or  an  entrance  diagram)  to  eorrpute  the  overall  failure 
probability  and  tne  relative  corartoulion  01  different  scenarios  (e.g., 
"rr*jrrenc*  and  correction  of  a grven  problem).  Figure  1 provides  a 
schematic  titu»w«uon  of  the  structure  ot  this  integration  mooei 

PRA  FOR  THE  THERMAL  PROTECTION  SYSTEM  OF  THE 
SPACE  SHUTTLE:  MODEL  STRUCTURE 

APRAmodetcurTemfyunderruoyiortheTPSofihe  space 
shutte  rekee  on  a partition  ot  me  surface  along  several  dimensions: 

(t)  the  external  toads  (matoty  heat  and  debris)  to  which  the  orbit er 
can  be  aubfeoed  and  that  vary  ypvding  to  the  location  on  the 
ortfcen  sumacs  and  (2)  the  cnbcaliy  of  the  afferent  subsystems 
located  immediately  under  the  aluminum  skin,  in  order  to  atiow 
recommendations  regarding  the  management  ot  the  relevant  sub- 
systems,  the  modem  dtvtted  into  two  panrthe  first  part  ts  a study 
ot  debendlng  ano  bum-through  due  to  weaknesses  of  the  bond, 
heal  toads,  vibration*,  etc.:  ths  second  part  is  a separate  study  of 
tne  etpao  of  debris,  their  sources,  and  their  enacts  on  me  TPS 
retiebttty.  in  this  paper,  me  scope  of  the  PRA  model  is  limited  to  the 
tiles  located  on  the  underneath  surface  of  the  d miter. 

FU*  pan:  aebondmg  and  bum-through  (excluding  the  effect 
of  debris) 

Fi^jra  2 provides  s schematic  illustration  of  the  partition  of 
the  ortrttrt  underneath  sun  ace  for  the  first  pan  ot  the  analysis 
(there  is  no  aitons*  at  this  stage  to  locate  realistically  tne  different 
*onee  according  to  tenpeftture  and  ert  catty).  A minimal  zonejor 
nwt.  zone)  is  an  element  of  the  final  partition  or  the  surface.  Eacn 
min.  zone  of  index  i « thus  charactanzad  by  a heat  index  (k(i))  and 
a entiesuty  index  <#»))* 

The  basic  notations  are  the  taflowtog: 

ORIGINAL  PAGE  S3 
CF  POOR  QUALITY 


F(t):  Failure  of  1 he  whiter  loss  ot  vrr  ^ : .i-.&  and  crew 

(LOV/ClatlauncmpnmanlyeL  t:  ot 

the  TPS 

n:  Total  number  ot  tiles  or  the  on ; * ! * 

j:  index  of  cnhcaitty  area  <tx:cnb  i!  s^oftht  mtn. 

zones 

covering  me  hydraulic  system; 
l<:  Index  of  terr^erature  area 

i:  index  ot  min.  zones  <j,  k) «» j(h  i ri} 

N,:  Number  of  failure  patches  m rr  n . mane  u 

nr:  Number  of  tiles  in  mm.  2 one  i 

Ft : Failure  of  tne  tw  We“  (Miatvr  i ;;  ■ in  a 

failure  patch 

P]F  1 : Failure  of  any  adjacent  tiia  gm 1 1 listing  failure 

Initiation  ot  1 tabor*  naton-; 

it  is  assumed  m this  phase  of  the  analyst!  rtn  any  failure 
patch  (of  size  one  or  morel  deveioot  by  the  loss  1 i n Ikst  tile  (FI : 
initiating  failure  tor  the  patch),  followed  or  not  : sr.t  failure  of 
adjacent  tiles  < F'|Ft ) . The  pmoabitoy  of  losing  the  <i  t:?  srsn  m a patch 
depends  orvthe  failure  mode  tdebondina  or  burr-  hisugli): 

p,(Fl)  ■ p(F  1,  dobonding)  ♦ e^(Fl,bun’fruuBli) 

The  probability  of  debonding  is  ej»_«ned t? ; :j  i ir nfeoendcm 


of  r (trie  loefcihfi  an  the  whiter}  whereas  The  second  term 
(bum-tn  rough)  c ranes  on  me  temperature  component  of  the 
min^ore  descrxnr  iv^. 

Development  of  ;i  liijyr*  p?frn  of  **rm  M oh/en  that  it  (p  ffljfi 
zone  i: 

«m|  f 1 «n  Pw,tr|Fi)-'’  * PfuP*  | fi)] 

This  pcaebitty  depends  en  the  temperature  ot  the  min. 
zone  (Index  *(iy 

Dft»XaMOfltt.n:  Hi  MMsaUpmin 

p<H>  pfiriT  {i-PM>(Fnrw 

This  en-  *atk)n  is  mimes  that  the  development  of  different 
patches  are  irr  ansawim  events  and  that  tnere  is  no  ovenap  of 
patches,  ue..  mm*  product  EV(N.)  x EV(U)  is  negtiQfete. 

EVfNj  i\  t ^lPfFi)  (•expected  value  ot  me  number 
ot  patches  m mm.  zone  1} 


r 

1 

( 


t 


SfiP.K: 

hot;  critical 
coid;  critical 
cold;  non  critical 
hot;  non  critical 


Figure  2:  Dout;  * r-  wmdon  of  £ftt> 


eurtaco  lor?  of  ihe  tikis 
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EV(M)  * i '(i-p^F  | FI}]  (.  expected  value  Of  th#  sue 
oi  a pawn  conouicnai  on  its 
start) 

Faflum  of  ttw  orbiter  riu*  to  a natrfi  ef  «ize  M' 

As  part  of  tti«  c&ta.  one  neeosttte  prooaDtiity  ot  failure  of  the 
ororter  aue  to  the  oevetopment  ot  a taiJure  oaten  ot  a given  size  «n 
a zone  of  given  criticality.  These  data  may  De  oocameo  through  an 
analysis  of  the  retiaoiltty  of  Ihe  systems  tested  under  the  omrttr 
surface  and  their  oomritxition  to  me  overall  rebaoitty  ot  the  ormer . 
These  orooaoilrties  can  oe  used  to  oelme  cntjcality  itself.  p[F)  thus 
depends  on  j(i),  the  criticality  index  ot  min.  2one  i. 

p.lFIMwiJ-p^ 
p(F  | MmZ)  • p;, 

Pj(F  1 M*mi  • p,  „ 

p*r»nr#  of  the  flPhiMrrhr#  N rntmei  of  rsnoom  n\r*‘ 

a failure  ot  the  or oner  due  to  TPS  lailure  in  mm  rone  f occurs  it  any 
ion#  or  more)  of  the  patches  of  mm.  rone  i causes! aikue  Given  mat 
failure  prccaDiirties  p(Fi ) ana  ptFl  are  assumed  to  Oe  smart*  one 
can  write: 

p(F  | N,-q|  -qxp', 

which  p*,  Is  the  probability  that  an  amttrary  «■»«*  tn  zone  J causes 

X Prf).-  xp(s«#m) 

m • Uo  - 

Sp~  *9jrv' r* 

fll  • 1 to  « 

infinity  is  used  es  a convenient  approximation  ot  upper 
bounds  when  the  prcoaotUiy  of  large  values  of  the  random  vanabte 
is  suttidamty  small 

P*ebefc»tttv  of  oibiterlaihire  due  to  TPS  feikim  in  zone  i: 

p(F  for  all  patchea  in  min.  zone  i) 

- X I N,  • q)  « p(N,  • q) 

q * 1 to* 

- X P.  * fl  * PtH  •$> 

q • t to  « 

• p-,  x £V(N) 

- P,x  ft,  xp^lFt) 

Failure  of  tne  ertvter  tor  an  in#  m«n  mom: 

wr  • Xp‘.  * R * P.«(F1> 

i- 1 »4 


p(F)  - 1,  f,(x)  P(F  I *)  Ox 

in  tn®  complete  analysts  of  the  external  ®v®m*.  n is  neces- 
sary to  taka  into  account  tne  different  onases  ot  tn*  light  in  croer  to 
ootam  a OistnBuuon  ov»r  time  ot  loss  of  lira  til*  and  a measure  of 
tne  aeparcune*  on  nm*  ot  f n*  toss  ot  subsequent  liies  attar  loss  ot 
tn*  lirst  on*. 

Second  phase:  risk  ol  failure  du«  to  aeons 

The  analysis  Begins  with  the  stuffy  ot  tn*  sources  ot  debris 
(e.g..inauladonotth#  external  tarn,  other  pans  olth*  STS.  external 
obteett)  tn  onMr  to  obtain  tn*  probability  ot  aif»*r*nt  scenanos 
characterized  by  tn*  nmure  ana  tn*  size  of  debra.  tn*  impact  s 
looaton  on  tn*  oro  iter's  surtac*.  and  tn*  tim*  ot  during  tn* 
flight.  This  anaiysa  leads  to  a description  ot  th*  initial  tie  damage 
(including  probability  ot  a nit  tor  ttu  in  ditiarant  zones,  distribution 
ot  number  ot  tiles  eutaity  nit  com* Iona i on  aeons  impaa.  s*v*r*y 
of  tn*  damage  conditional  on  ■"»“«?.  tn  this  second  part,  the  start 
of  a tailur*  patch  * cnaractenzed  by  tn*  oossctlity  of  muNipte  mrnat 
failures  w*h  oat arent  levels  ol  seventy.  The  stuoy  o*  turmer  devel- 
opment ot  failure  ?*m-ftes  conditional  on  initiating  taiturefS)  and 
consequent  effect  on  tne  oroaerts  simitar  to  me  analysis  pertoimed 
ntnetrst  pan.  The  main  difference  * that  the  analyse  ot  the  effects 
ot  debris  invotves  caterer*  levels  ot  damage  severffy. 

management  of  the  tiles  and  potential  errors 

TPS  management  and  reliability 

Th#  quality  ot  the  process  of  design,  manufacturing,  instal- 
lation. mspecuon,  and  maintenance  of  th#  Wes  affects  tn#  probabil- 
ity ot  initial  and  subsequent  taHuret  through  Bum-through  or  de- 
bonding  tp(F  t)  and  PtFTFi}  m the  previous  model).  Th*  quality  ot 
th#  management  ot  other  systems  suen  as  th*  external  tank  tnatar* 

potential  tourcasotdaortsa«tc*s  the  pn^*Mity  and  thesavartyot 

damage  due  to  define  impaa  m efferent  locations  ot  the  o rotter. 
Given  is  structure,  me  model  deaaibed  efiov*  can  fie  toad  to 
the  earns  ot  improvements  n tne  management  ot  m#  We* 
and  mtneproc#****  of  the  oroiter  through  the  aaaeasment  ©tine 
change#  mp<  Ft).  p(R).  and  sunder  variables  tonne  case  at  debris 
impaa. 


For  example,  current  maintenance  otthe  ties  depends  on 
the  expeaed  neat  loads  (with  empnasis  on  zones  euOt  ss  ttw 
leading  edges  ot  wheat  doors)  the  procedure  a independent  ot  tne 
critmnty  at  the  systems  located  dueeny  under  me  asmwum  skin. 
Prtonttzmen  m th#  TPS  procetamg  as  we#  as  the  processing  ot 
adiacsiesouteesor  define  mayfi*  Pesigned  to  decreasefurtnerih# 
pfopeoaty  or  mmatmg  tile  laHuret  m the  most  enbeat  zones.  The 
results  can  then  be  meaaited  by  ccrrcuUtion  ot  tn*  overaa  risk  by 

tne  prevrous  mooei  using  new  vskMS  a Mttsdng  taikaM.  Anotner 

example  ot  improvement  mat  can  fie  assessed  tnrougntn*  model 
is  tn*  development  and  the  us*  ot  non  destructive  fasting  of  tn* 
RTV.  Th#  prooabiatfesot  failure  p(Fl)  and  p(FT  inthefrtt  part  ot  th* 
model  increase  over  tim*  with  the  number  ot  flights  ot  tne  oreiter. 
Non  destructive  testing  can  macaw  oetenoraten  ot  the  bonomg 
and  alow  timely  reptacemam. 


“ffeCT  o!  dxlBmftl  *v#nt* 


The  probability  of  failure  is  the  sum  over  all  values  ot  the 
external  load  X <*.g..  maximum  temoeratur*  * a turns  out  to  be 
1 critical)  of  tne  probatxtlty  density  function  tor  X mult  plied  ay  the 
probability  of  failure  ot  tha  orbrter  conditional  on  X. 


in  additipn  to  oonsoous  decisions  suen  is  iprohng  tn* 
agmg  phenomenon  or  unSortn  fnspeaton  of  the  tiles,  amors  can 
~>«.r « every  step  of  tne  manufacturing  at  tne  caterer*  element# 

of  the  TPS  (lor  exanwe,  a oad  Batch  ot  RTV).  ot  the  inspection  and 

martensnee  process  l e g. . wrong  m-^jrement  of  step  and  gap). 
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la  cases  where  mere  is  no  controversy  about  value  judg- 
ments involved  in  top  tavei  oecisiORS  (considering,  tor  example,  that 
i he  opinions  ot  Congress  nust  crevartthe  question  is  to  ensure  mat 
'elevant  information  is  avaitabla  to  mis  toe  management  when 
unaameatai  oeosions  are  maoe.  ana  that  me  organizational  and 
ndividuars  risk  attitudes  eventually  refiect  that  at  this  top  level.  The 
objective  is  10  design  an  incentive  structure  and/or  a feedback 
mechanism  mat  ensures  this  adequacy.  This  implies  me  use  oi 
appropriate  information  mat  is  reesily  available,  the  w»»_"«'tlon  of 
additional  intormanon  wnen  if  has  a net  positive  value  given  the 
organization's  preference  system,  ana  a C fusion  making  process 
that  leads  to  consistency  in  nsk  attitudes.  The  quality  or  the 
leadership  clearry  plays  an  etserwil  pan  in  the  clarity  ano  the 
insistency  ot  standares  across  me  organization. 

SOME  ORGANIZATIONAL  PROBLEMS  THAT  AFFECT  SYSTEM 
^SUABILITY 

From  this  analysis  of  errors  one  can  identify  two  broad 
categories  ot  organizational  problems  mat  relate  to  the  failure 
jfooaoitiy  of  a system  because  they  attea  me  probability  ot 
process  errors:  miotmavon  problems  and  memnova  problems  with 
■he  possibility  of  combination  ot  bom. 

Information  problems 

intormanon  problems  may  occur  wtinm  an  organization  or 
across  orgarazanons  managing  m*  same  system.  They  may 
^'e  the  following: 

* SemMniiai  enainewrinn  end  leek  gf  fiwtuWr  The  engh 
..■renng  process  may  be  designed  In  a linear  manner  without 
feedback  loops  to  check  that  lh«  deign  corresponds  to  me  needs. 
" that  resources  at  allocated  prooeny  tor  optm*  reliability.  For 
< ample,  there  may  nor  exist  any  roecharxsm  to  eneck  the  shadow 

price  of  the  constraints  sat  by  management.  i.e..  what  would  bath* 
gains  (eg.,  m retiabiily  | associated  to  different  levels  o t retaxation 
•t  me  constraints  (e.g.,  et  schedule). 

* Access  m retovem  irenrounvin,  The  oigirwkxiri  prob- 
lem is  to  ^entity  and  communicate  signals  uas  are  relevant  and 

eiiawe.  Organizational  filters  may  be  sucn  that  soma  important 
ignati  end  up  missing  while  irrelevant  ones  overload  and  contuse 
•ne  system.  First,  the  worviCuaJ  mutt  be  able  to  toenotywhet  to  leek 
lor  ane  to  ootain  im$  information  In  time.  Comrnupvw^r«  may  fail 
'’f  * variety  of  reasons.  Appropriate  oamn^-^a^i  charmeie  may 
imoly  not  exist.  or  existing  channels  may  not  workdw  to 
-r  fffpracacat  procedures,  or  osSbente  retsnrion  of  infoimausn. 
Also,  the  signal  may  be  ignored  because  otpievrousiaae  ale  nstme 
*ry-woil  ettect). 

* CamrrmninanntiatuncerfaarBiea.TheiranfmeManmeMetxn 
be  distorted.  For  example,  ms  orgamvw-in  may  not  oe  equipped  (in 
its  proceaires.  its  culture,  etc.)  to  comrruncais  t^vwrly  imperfect 

itormationanaunceftaiKy.  Therefore.  quaMiers  ("Go  but...-)  may 
a dropped  in  the  process. 

incentive  problems 

Incentive  problems  mty  affect  the  system  s performance 
-moot  the  process  arid  include  the  tallowing? 

t i 

* ineenriv**  mwermi  ootimhim  in  organttaporw  whose  finai 
1 is  id  product  a oosrave  product  (tsoppeseoro  detecting taussi 

ohd  where  tnsnsksotvslbietallurea  are  suffictanny  low.  incsntrvet 


3i  eacfi  tevei  may  lead  to  the  suppression  of  baa  news  ana 
therefore,  a Dias  towards  optimism.  This  is  true,  in  pameuiar.  when 
the  information  is  incomplete  ana  in  situations  cl  uncertainty  (as 
described  abovei. 

• Pressure*  onrneenficaieatn  The  tecnnieal  groups  whose 
task  is  on  the  critical  pam  to  production  or  operation  may  lind 
themseivee  under  pressure  to  cut  comer.  This  pressure  increases 
with  the  Oitference  ot  total  time  (ooiective  function)  between  tnem 
and  the  next  critical  task. 

* rWfflriilriwsef  leaminnin  x hlnh.visihilirvsifuatirtfj . ttmav 
be  difficult  (or  an  organization  Subjected  to  public  scrutiny  to  assess 
its  own  performance  and  team  from  its  mistakes,  in  situations  ot 
success,  there  may  be  a tendency  to  overtook  signets  ot  potential 
probienw  whereas  in  situations  of  difflcufties.  the  organization  may 
be  overwhelmed  by  signals  of  problems  f it  does  not  have  aaar 
procedures  to  aseeaa  their  relative  seventies  and  to  set  priorities 
among  remedial  actions.  Furthermore,  organizational  learning  ana 
in  pameuiar  change  of  rotes  may  be  difficult  wnen  n can  be 
interpreted  at  aORMUngthst  previous  procedures  were  inadequate. 


RETURN  TO  THE  PftA  MODEL 

Assembly  modei 

The  pr"*»«,1fy  of  failure  p<Fi}  and  of  *y**equent  failures 
p(F|Fi)  can  oe  linked  to  me  ~?ct"ienee  of  errors  of  different  types 
(e.g.,  a (racoon  of  the  surface  only  was  covered  with  HTV)  and. 
furthermore,  to  combinations  of  snots  <e.g.,  insufficient  quantity  of 
bondtag  or  inappropriate  step  to  next  Ole  due  to  iris-measurement). 
For  each  type  of  error,  the  question  it  to  know  what  is  its  level  of 
severity,  the  number  of  tttet  that  a can  affect,  and  their  location  witn 
respect  to  the  crataaiity  partition  ot  the  orMer  surface,  in  addition, 
it  may  be  important  to  consider  wnemer  Sea  gross  erreror  an  error 
of  judgment  that  may  be  tasa  easily  identSied  and  corrected  An 
error  navmg  "^Tta.  the  rtsoectcn  process  can  be  analyzed  as 
a sequence  of  ttttaro:  ■ each  step  the  error  may  be  identified  or 
nested.  Finasy,  given  tnat  an  error  has  occurred  and  bean  identi- 
fied, it  may  or  may  not  be  corrected. 

This  analysis  is  described  by  the  vffluenee  diagram  shown 
in  Figure  4.  The  result  Is  edMiftuttan  forthe  probability  of  initiating 
faJtare  p(Fl ) given  possible  eenematens  of  errors  and  their  levels 
ol  seventy  and  me  dtatrtbudaft  of  the  number  ot  liles  affected.  This 
dunpution  of  values  of  pf  F t ) la  then  entered  in  the  prewous  model 
to  ootain  a spectrum  of  failure  prooabittiet  iLOV/C)  due  to  faifcim  of 
the  TPS.  Th#  model  can  men  be  used  to  tttttt  the  effects  ol 
organizational  Improvements  designed  to  increase  me  reliability  of 
the  TPS. 

Exaritafroaloigengitontalmprovemewer  TPS  manage- 
mm  mm  vuv  ansfyw  ffVOUQn  tm  fnoom 


* lfflpmvt  m*  nr  in  team  tne.  Poaaibto  rmatumt  mcfadt  tranq 
analysis  ano  issdd aoc  mscnanaur*.  Thsir  tfftci.  in  ins  moosi . o to 
dacrsasa  ths  probabiiy  <rf  occmrsnoa  ol  •non  m ths  tint  Discs, 

Also,  M«^svvSfnsnt  of  the  testing  (Such  as  the  testing  of  RTV  for 
aging  affects)  whose  effect  Is  to  decrease  the  probaoety  of  failure 
lee*. 

A^tn<UlaflatanaiMflUCtt4ax»w^tome  criticality 
of  the  i Be  xscaion  can  be  analyzed  by  me  model  through  me 

page  is 

OF  POOR  QUALITY 


A-8 


LMSC  f 2230402 


1 


ub  ol  me  DronaCilrtv  of  initiating  failure  in  me  most  critical 


' 3eHfirofOc«rtiif*«iortlM»insBBr»ioivittit*><anrtinBstGrs__ 
; tri*vai  or  jfitormanon  increase  the  probaoiirry  ot  ooservation 

- if  conditional  on  occurrenca  and  increase  ins  procabiiity  ol 
-ection  conditional  on  oosarvation. 

'■LUSION 

The  extensions  el  classical  PPA  presented  in  this  paper 
ease  considerably  the  value  of  information  of  suen  studies 
:*  se  it  allows  setting  priorities  among  a larger  number  of 
ial  imorovements.  An  analysis  of  the  engine ewng  process 

- - focusing  attention  and  resources  (lime  m particular]  on  me 
st  cmcal  tasks.  Orgaraxatbnai  aspects  of  engmsennq  reliacility 

interest  to  researcners  in  organizations'  Penavw."  The 
i lainre  method  outlined  here  snows  insusion  of  this  body  ol 
r rdge  in  me  decision  making  process  Pyassessmgtne  relative 
■o nance  of  tnese  organizational  effects  tnrnugn  tne*  contnou- 
to  the  overall  system  reliability. 
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INCLUOES.  VEHICLE  CONFIGURATION  (TPS  RELEVANT) 

TILE  ANO  SIP,  FIB,  SCREED,  PVT.  BV 
DESIGN  GAP  FILLERS  (NEW) 

F/B  ANOMALIES  (NEW) 

S/G  ON  ORIGINAL  BUILD  PLUS  ON  OCCASION 
ENGINEERING  DATA  i REQUIREMENTS  FOR  LAST  FLOW  OF  TILE 
PR  S RESULTING  IN  TILE  OR  FIB  REMOVAL,  MR.  SHAVED 
OOESN'T  INCLUDE:  TCS.  THERMAL  BARRIERS 
MANY  TPS  REPAIRS 
CAN  SORT  BY:  MULTIPLE  FIELDS 

INCLUDES  INFORMATION  BACK  TO  STS-4 

B«fME  £N0  0F  A ,LOW'  fUGHT  DAMACE  REC0R0S  ARE  «OM  ACTIVE  DATA 

ACTIVE  DATA  BASE  FOR  TILE  REMOVAL  GOES  BACK  THREE  FLIGHTS 
t_|  EARLIER  OATA  CAN  BE  ACCESSED  ON  REQUEST 
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*,  deification  requirement 
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CRACKED.  GOUGE) 

3 ENGINEERING  EVALUATION 
A.  ENGINEERING  CHANGE  (MCR,  EO,  SAR) 


INSTALLATION  DATA  : REMOVAL  CODES)  5 CH/V1RED  DAMAGED  FILLER  BAR 
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9 SCREED /HEATSINK 
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S.  ACCESS 
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• TERMINALS  AVAILABLE  AT  DOWNEY,  KSC,  JSC 

• OATA  ENTERED  BY  VERY  KNOWLEDGEABLE  TPS  OPERATORS 

• OATA  BASE  NOT  "CONTROLLED" 

• WIDESPREAD  USE  BY  TPS  COMMUNITY 

• NO  FORMAL  TRENDING 

• GRAPHICS  CAPABILITY  IS  AVAILABLE 
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ABSTRACT 


Ouring  the  past-Challenger  investigation  the  National  Research  Council 
Shuttle  Criticality  Review  and  Hazard  Analysis  Audit  Committee  expressed 
concern  that  the  approximately  1,300  safety-critical  failure  points  were  not 
prioritized  based  on  probability  of  occurrence.  They  suggested  that  an 
integrated  systems  assessment  be  devised  which  would  provide  for  failure 
probability  quantification.  The  National  Space  Transportation  System  Program 
Office  subsequently  initiated  a pilot  project  employing  the  probabilistic 
risk  assessment  (PRA)  methodology  to  evaluate  its  usefulness  and  also  to 
identify  any  areas  of  concern  not  previously  established. 

This  report  describes  the  PRA  performed  on  the  Shuttle  Main  Propulsion 
Pressurization  System,  which  is  an  assembly  of  many  components  contained 
within  three  of  the  four  vehicle  elements  (the  Orbiter,  the  External  Tank, 
and  the  main  engine)  and  which  crosses  the  element  interfaces.  The  PRA  was 
performed  by  Lockheed  Engineering  and  Management  Services  Company  in 
conjunction  with  the  Lockheed  Missiles  and  Space  Company  Research  and 
Development  Division.  The  report  includes  a discussion  of  the  scope  of  the 
analysis,  a description  of  the  team  organization,  a description  of  the  PRA 
methodology  and  its  application  in  this  study,  and  a summary  of  lessons 
learned.  A matrix  is  also  provided  to  map  the  information  in  this  report  to 
the  information  in  the  analysis  report  (LMSC-F2230402,  January  1988),  which 
is  provided  as  an  attachment. 


ii 


CONTENTS 


Section 

1.  BACKGROUND 

Z.  SCOPE  OF  ANALYSIS 

3.  TEAM  ORGANIZATION 

4.  PRA  METHODOLOGY 

4.1  SYSTEM  DEFINITION 

4.2  FAULT  TREE 

4.3  FAILURE  RATE  DATA  BASE  DEVELOPMENT 

4.4  PRA  SOFTWARE 

4.5  SUMMARY  OF  PRA  METHODOLOGY 

5.  LESSONS  LEARNED 

5.1  MPPS  PROBLEM  AREAS 

5.2  USEFULNESS  OF  PRA  FOR  NASA 

5.3  RISK  HIERARCHY 

5.4  KNOWLEDGE  CAPTURE 

5.5  COMPLEXITY  OF  THE  PRA  METHOOOLOGY 

5.6  THE  FAILURE  RATE  DATA  BASE 

5.7  SOFTWARE  CONSIDERATIONS 

5.8  PRA  AS  A MANAGEMENT  TOOL 

6.  REFERENCE  MATRIX 

ATTACHMENT  1 

VOLUME  I.  SECTIONS  1 - 7 
VOLUME  II.  APPENDICES  A - 7 
VOLUME  III.  APPENDICES  E - K 


Page 

1-1 

2-1 

3- 1 

4- 1 
4-1 
4-1 
4-6 
4-6 

4- 6 

5- 1 
5-1 
5-1 
5-1 
5-2 
5-2 
5-2 
5-3 

. 5-3 
. 6-1 


iii 


FIGURES 


Figure 

Page 

3-1 

Team  organisational  responsibilities 

3-3 

4-1 

Space  Shuttle  Main  Propulsion  System  Pressurization 

4-2 

A 9 

4-3 

H- L 

4-3 

Fault  tree  architecture 

4-4 

iv 


ACRONYMS 


CIL 

DOO 

ET 

FMEA 

GSE 

HA 

LEMSCO 

LHS/TEMAC 

LLNL 

LMSC 

LRU 

MIL-SPEC 

MPPS 

NASA 

NSTS 

OMS 

OPF 

PC 

PRA 

PRACA 

R&DD 

SAIC 

SSME 

STS 

SR&QA 


critical  items  list 
Department  of  Defense 
external  tank 

failure  modes  o..u  effects  analysis 
ground  support  equipment 
hazard  analysis 

Lockheed  Engineering  and  Management  Services  Company 

Latin  Hypercube  Simulation/Top  Event  Matrix  Analysis  Code 

Lawrence  Livermore  National  Laboratories 

Lockheed  Missiles  and  Space  Company 

line  replaceable  unit 

military  specification 

Main  Propulsion  Pressurization  System 

National  Aeronautics  and  Space  Administration 

National  Space  Transportation  System 

Orbital  Maneuvering  System 

Orbiter  Processing  Facility 

personal  computer 

probabilistic  risk  assessment 

Problem  Reporting  and  Corrective  Action 

Research  and  Development  Division 

Science  Applications  International  Company 

Space  Shuttle  Main  Engine 

Space  Transportation  System 

Safety,  Reliability,  and  Quality  Assurance 


1.  BACKGROUND 


During  the  post-Cha! lenger  investigation  the  National  Research  Council 
Shuttle  Criticality  Review  and  Hazard  Analysis  Audit  Committee  expressed 
concern  that  the  approximately  1,300  safety-critical  failure  points  were  not 
prioritized  based  on  probability  of  occurrence.  They  suggested  that  an 
integrated  systems  assessment  be  devised  which  would  provide  for  failure 
probability  quantification.  The  committee  further  recommended  that  the 
assessment  he  closely  coupled  with  the  existing  failure  modes  and  effects 
analysis/critical  items  list  (FMEA/CIL)  activity  to  assure  coverage  of  the 
truly  safety-critical  items  in  the  Space  Transportation  System  (STS). 

The  National  Space  Transportation  System  (NSTS)  Program  Office  initiated  a 
pilot  project  employing  the  probabilistic  risk  assessment  (PRA)  methodology 
to  evaluate  its  usefulness  and  also  identify  any  areas  of  concern  not 
previously  established.  This  methodology  has  been  used  successfully  by  the 
nuclear  industry  in  analyzing,  quantifying,  and  prioritizing  the  risks 
presented  by  nuclear  power  plants. 

This  report  describes  the  PRA  performed  on  the  Shuttle  Main  Propulsion 
Pressurization  System  (MPPS).  The  MPPS  is  an  assembly  of  many  components 
which  is  contained  within  three  of  the  four  vehicle  elements  (the  Orbiter, 
the  External  Tank  (ET),  and  the  main  engine)  and  which  crosses  the  element 
interfaces.  The  PRA  was  performed  by  Lockheed  Engineering  and  Management 
Services  Company  (LEMSCO)  in  conjunction  with  the  Lockheed  Missiles  and  Space 
Company  (LMSC)  Research  and  Development  Division  (R&QD). 

A summary  of  the  conclusions  found  herein  is  as  follows: 

a.  The  PRA  methodology  and  the  NSTS  FMEA/CIL  techniques  complement  each 
other,  and  together  provide  an  enhanced  approach  to  risk  management. 

b.  The  PRA  methodology  is  adaptable  to  NASA  space  systems  and  is  usable 
throughout  the  NASA  organizational  environment. 
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c.  The  PRA  methodology  can  be  learned  and  applied,  using  the  currently 
available  tools,  by  any  integrated  aerospace  organization,  and  does  not 
require  extensive  training. 

This  report  consists  of  a discussion  of  the  scope  of  the  analysis,  section  2. 
This  discussion  is  followed  in  section  3 by  a description  of  the  PRA  team 
organization,  including  the  skill  mix  and  experience  of  the  team  personnel. 
Section  4 describes  the  PRA  methodology  and  its  application  in  the  study  of 
the  MPPS.  The  lessons  which  were  learned  from  the  study  are  contained  in 
section  5.  Section  6 provides  a matrix  to  map  the  information  outlined  in 
this  report  to  the  detailed  Information  contained  in  the  LMSC  R6DG  analysis 
report  (LMSC-F2230402,  January  1988),  which  is  provided  as  an  attachment. 
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2.  SCOPE  OF  ANALYSIS 

The  MPPS  which  NASA  requested  be  analyzed  using  PRA  methodology  is  comprised 
of  those  elements  which  furnish  the  prsssurant  gas  at  the  necessary  condi- 
tions for  proper  and  safe  operation  of  the  entire  Main  Propulsion  System, 
from  the  beginning  of  ground  operations  to  successful  return  of  the  Qrbiter 
to  earth.  A complete  analysis  would  have  required  consideration  of  the 
entire  Main  Propulsion  System  which  includes  the  ET,  the  Orbiter,  the  Space 
Shuttle  Main  Engine  (SSME),  and  the  ground  support  equipment  (GSE).  Such  a 
task  was  a far  greater  effort  than  was  required  to  meet  NASA's  objectives  of 
demonstrating  the  usefulness  of  the  PRA  methodology  for  manned  space 
applications  in  a reasonable  time  and  at  a reasonable  cost. 

The  MPPS  system,  which  was  defined  with  NASA's  concurrence,  can  best  be 
described  as  a collection  of  functions  which  cross  many  system  boundaries 
rather  than  as  a well-defined  system  in  Itself.  The  following  functions  are 
considered: 

a.  Supply  of  pressurant  gas  to  the  ET  to  prevent  its  collapse  from  external 
pressure,  and  to  provide  sufficient  positive  pressure  to  prevent 
cavitation  of  the  SSME  pumps. 

b.  Supply  of  purge  gases  and  gases  to  inert  the  system,  minimizing  the 
explosion  hazard. 

c.  Supply  of  pressurant  gas  to  actuate  engine  valves  as  a backup  to  a 
malfunctioning  hydraulic  system. 

d.  Supply  of  gas  as  a primary  source  of  actuation  pressure  for  various 
system  valves. 

The  analysis  considered  those  elements  and  components  of  the  Orbiter,  the  ET, 
the  SSME,  and  the  GSE  which  either  affect  the  pressurization  functions  or  are 
affected  by  the  pressurization  functions.  The  scope  of  the  analysis  led  to 
partial  inclusion  of  the  Electrical  Power  Distribution  and  Control  System, 
the  Electronic  Instrumentation  and  Control  System,  and  the  Hydraulic  Power 
System,  as  well  as  operational  considerations.  The  scope  of  work  chosen  for 
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the  analysis,  while  abbreviated  from  what  would  be  considered  in  a complete 
system  approach,  provided  extremely  useful  results  and  demonstrated  the 
effectiveness  of  the  PRA  methodology. 


3.  TEAM  ORGANIZATION 


The  team  organization  is  shown  in  figure  3-1.  The  task  was  administered  by 
the  NSTS  Program  Office  with  technical  management  assigned  to  the  Propulsion 
and  Power  Division  of  the  Engineering  Directorate.  LEMSCO  provided  program 
management,  performed  engineering  and  organizational  support,  and  provided 
the  liaison  with  other  related  NASA  organizations.  LMSC  k&uu  performed  the 
PRA  analysis  and  delivered  an  analysis  report  (LMSC-F22304Q2,  January  1988) 
which  is  included  as  attachment  1.  The  engineering  and  analysis  teams 
consisted  of  personnel  having  a broad  mix  of  PRA  and  spacecraft  systems 
engineering  expertise,  including  new  college  graduates,  journeyman-level 
engineers  with  PRA  and  system  engineering  experience,  senior  project 
engineers,  and  managers.  These  teams  possessed  little  or  no  experience  with 
propulsion  systems.  They  were  assisted  by  consultants  who  contributed  an 
understanding  of  the  system's  operation  and  an  understanding  of  NASA's  needs 
across  the  conmunities  of  engineering;  safety,  reliability,  and  quality 
assurance  (SR&QA);  program  management;  and  PRA  peer  disciplines.  Support  for 
the  site-licensed  PRA  CAFTA  software  was  provided  by  the  vendor,  Science 
Applications  International  Company  (SAIC). 

A subsequent  independent  peer  review  of  the  PRA  was  accomplished  by  Lawrence 
Livermore  National  Laboratories  (LLNL)  under  the  sponsorship  of  the  NSTS 
Program  Office.  This  review  was  supported  by  the  LMSC  analysis  team,  and  the 
LLNL  review  results  and  team  responses  are  included  in  attachment  1. 

The  team  mix  brought  these  various  specialty  areas  to  the  project;  however, 
the  systems  engineering  specialists  originally  had  no  knowledge  of  PRA 
methodology,  nor  did  the  PRA  specialists  have  any  knowledge  of  manned  space- 
craft systems.  LEMSCO  personnel  and  the  consultants  provided  the  under- 
standing of  engineering,  operations,  and  NASA's  SR&QA  techniques  and  policy. 
LMSC  provided  the  necessary  expertise  in  PRA  analysis  techniques.  Meetings 
and  working  sessions  between  the  groups  provided  the  necessary  cross- 
fertilization across  disciplines.  This  interdisciplinary  exchange  required 
by  the  process  made  it  evident  that  PRA  would  be  especially  useful  on  a new 
project  as  an  integrated  activity  during  the  design,  development,  and  test 


phases  rather  than  as  a separate  appraisal  after  vehicle  development 


mature. 
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• NATIONAL  SPACE  TRANS- 
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4.  PRA  METHODOLOGY 


Figure  4-1  is  a graphical  summary  of  the  process  flow  used  to  perform  the 
MPPS  PRA.  The  figure  depicts  the  following  process  elements:  a system 

definition,  failure  rate  data  base  development,  and  PRA  software  tools. 

Figures  4-2  and  4-3  are  provided  for  discussion  of  fault  trees  contained  in 
section  4.2. 

4.1  SYSTEM  DEFINITION 

The  most  important  step  in  a PRA  project  is  system  definition.  This  is 
accomplished  by  an  engineering  review  of  all  system  and  component  documents, 
drawings,  and  schematics  to  provide  a clear  understanding  of  the  system 
requirements  and  operation.  This  allows  the  creation  of  a system  definition 
and  the  establishment  of  boundaries  defining  what  will  be  included  in  the 
scope  of  the  study.  This  was  difficult  because  the  MPPS  crossed  many  Shuttle 
element  boundaries  and  mission-operational  regimes. 

4.2  FAULT  TREE 

The  end  product  of  this  analysis  is  a fault  tree  whose  top  level  is  shown  in 
figure  4-2.  The  fault  tree  in  itself  does  not  reflect  the  system  reliability 
or  likelihood  of  failure.  PRA  assumes  that  components  fail;  hence,  it  is 
necessary  to  characterize  all  malfunctions  as  failures  at  the  component  or 
functional  level.  The  fault  tree  is  constructed  in  a logical  manner  to 
depict  the  relationships  between  failures.  This  requires  generation  of 
failure  modes  as  had  previously  been  done  by  the  NSTS  program  using  the  FMEA 
technique.  These  previously  generated  FMEA's  and  hazard  analyses  (HA's)  were 
used  to  complement  and  validate  the  current  fault  tree  analysis.  It  is  then 
necessary  to  determine  whether  failures  of  one  or  more  components  or 
functions  at  any  level  will  propagate  into  the  top  level  event,  which  is  loss 
of  life  or  vehicle.  This  is  illustrated  in  figure  4-3.  The  fault  tree 
progresses  from  the  top  event  through  the  definition  of  mission  phases, 
categories  of  failures,  and  definition  of  contributing  functions.  The  bottom 
level  represents  failures  or  groups  of  failures  which  contribute  to  system- 
level  failures.  The  fault  tree  simply  provides  a mapping  of  all  failure 


4-2 


Figure  4-1.-  Space  Shuttle  Main  Propulsion  System 
Pressurization  System  PRA  flow. 


EXAMPLES 


Loss  of  Life  or  Vehicle 
Loss  of  Engine 
Loss  of  Mission 
Very  Expensive  Failure 


Prelaunch  Countdown 
OPF  Checkout 
Ascent,  Orbit,  Landing 
OMS  Burn 


External  Tank  Related  Failure 
Engine  Shutdown  Related  Failure 
Fire  or  Explosion 
Propellant  Flow  Related  Fault 
Engine  Thrust  Failure 


Pogo  Suppression  Failure 
Helium  System  Depressurization 
LH2  Leak 

LH2  Tank  Overpressurization 
Valve  Failure 


Line/Weld  /Flange 
HEX  Coil  Rupture 
GH2  Flow  Contamination 
Pump  Cavitation 
2 of  3 Flow  Control  Valves 


Figure  4-3.-  Fault  tree  architecture. 
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events  which  can  progress  to  a top  event.  The  analysis  is  iterative,  as  the 
PRA  analyst  modifies  the  engineering  fault  tree  model  to  improve 
computational  efficiency  while  preserving  engineering  clarity. 

The  fault  tree  does  not  reflect  the  degree  of  system  usefulness.  In  the 
original  application  of  PRA,  response  to  failure  results  in  a safe  shutdown. 
In  manneu  ^pace  operations,  safe  shutdown  of  critical  components  or  functions 
is  not  acceptable,  and  it  is  necessary  to  continue  the  mission  using  redun- 
dant systems  which  have  not  failed  or  default  to  operational  workarounds,  or 
to  continue  operations  during  a safe  abort. 

At  the  top  of  the  fault  tree  in  figure  4-2  is  the  top-level  event  - loss  of 
life  and/or  vehicle,  which  would  result  from  a failure  in  some  component  of 
the  MPPS  which  propagated  to  the  top  level  event.  The  system  was  analyzed  in 
three  phases  of  mission  operation:  (1)  prelaunch,  (2)  powered  flight,  and 

(3)  ET  separation.  Each  phase  has  an  operating  environment  so  distinctive 
that  the  three  phases  were  identified  as  the  second  level  of  the  tree.  Each 
phase  then  forms  its  own  unique  tree,  and  the  software  treats  each 
separately.  The  next  level  of  the  tree  provides  categories  of  failures  which 
can  cause  the  top-level  event  to  occur.  Examples  of  these  are  ET-related 
failures  and  catastrophic  failure  due  to  fire  and  overpressure.  Finally,  the 
malfunctions  which  may,  by  themselves  or  in  combination  with  others,  cause 
the  loss  of  function  were  identified  and  placed  in  the  fault  tree  as  basic  or 
bottom  level  events. 

The  fault  tree  indicates  those  events  which  were  not  analyzed,  as  well  as 
those  that  were.  The  diamond  symbol  under  the  event  '‘catastrophic  failure 
due  to  internal  missile  generation"  indicates  that  this  event  was  not 
analyzed.  The  triangle  symbol  indicates  continuation  of  the  tree  event  in 
more  detail.  The  complete  fault  tree  is  quite  detailed  and  fills 
approximately  150  pages  similar  to  figure  4-2. 


4.3  FAILURE  RATE  DATA  BASE  DEVELOPMENT 

In  reference  to  figure  4-1,  it  was  necessary  to  obtain  failure  rate  data  on 
the  various  components  in  order  to  get  some  type  of  relative  ranking  of  the 
failures.  NASA  sources  such  as  Problem  Reporting  and  Corrective  Action 
(PRACA)  and  other  NASA  data  sources  were  inadequate,  either  because  of  the 
small  number  of  samples  or  because  service  life  and  operation  cycles  were  not 
available.  The  PRA  team  extracted,  from  Department  of  Defense  (00D)  sources 
such  as  military  handbooks  and  Rome  Air  Development  Center  notebooks,  generic 
failure  rate  data  on  similar  components  in  an  operating  environment  very 
close  to  that  experienced  on  Shuttle  flights.  These  data  proved  to  be  more 
acceptable  than  was  originally  anticipated. 

4.4  PRA  SOFTWARE 

In  reference  to  figure  4-1,  the  software,  CAFTA,  was  provided  under  license 
from  SAIC.  This  software  greatly  simplifies  the  PRA  analysis  process.  It  is 
used  in  developing  and  updating  the  fault  trees,  can  be  used  to  manage  the 
failure  rate  data  base,  can  quantify  and  prioritize  the  various  failures,  and 
can  be  used  in  sensitivity  analyses  where  the  effect  of  component  reliability 
on  the  probability  of  a top-level  event  occurring  can  be  evaluated. 

The  software,  which  was  run  on  an  IBM  PC-AT,  was  easy  to  use  and  understand. 
Training  times  were  short  for  PRA  team  members  who  had  no  previous  exposure 
to  PRA.  The  load  imposed  by  the  MPPS  analysis  taxed  the  limits  of  the 
addressable  memory  of  the  machine  and  indicates  that  more  complex  systems 
will  require  something  larger  than  a personal  computer  (PC). 

4.5  SUMMARY  OF  PRA  METHODOLOGY 

The  PRA  methodology  accomplishes  what  is  intended.  It  provides  an  accurate 
representation  of  failure  scenarios,  pinpoints  weak  areas  in  system  design, 
flags  those  areas  requiring  more  attention,  and  prioritizes  the  various 
categories  of  failures. 
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Its  weaknesses  are' as  follows: 

a.  It  cannot  test  for  model  completeness. 

b.  Its  quantitative  results  are  limited  by  the  quality  of  input  data. 

c.  The  analysis  may  be  simplistic  in  its  representation  of  the  system-level 
behavior. 

Fortunately,  these  weaknesses  can  be  minimized  or  eliminated  by  use  of  the 
FMEA  and  HA  techniques;  thus  PRA  and  FHEA,  when  used  together,  complement 
each  other. 


5.  LESSONS  LEARNED 


As  a result  of  the  MPPS  pilot  project  experience,  eight  major  lessons  were 
learned.  These  lessons  are  discussed  in  sections  5-1  through  5-8. 

5.1  MPPS  PROBLEM  AREAS 

No  new  problem  areas  were  identified  by  the  PRA  study.  This  is  not 
surprising,  since  the  Shuttle  is  a mature  engineering  system  that  has 
undergone  years  of  development  and  study.  This  observation  lends  additional 
confidence  to  the  FMEA/CIL  process. 

The  study  identified  the  single  largest  category  of  catastrophic  failures  to 
be  those  associated  with  leakage  of  pressurized  mechanical  .system  components 
which  results  in  explosion  or  compartment  overpressurization.  This  single 
failure  category  contributes  over  84  percent  of  the  MPPS  risk.  The  addition 
of  functional  redundancy  will  not,  in  general,  reduce  overall  risk. 

Additional  piping  and  components  containing  propellants  or  pressurants 
increase,  rather  than  decrease,  the  catastrophic  risk  sources,  with  the 
resulting  failure  probability  growing  at  a polynomial  rate.  It  would  appear 
more  beneficial  to  emphasize  controlling  the  direct  sources  of  risk  through 
ground  maintenance  and  early  leak  detection. 

5.2  USEFULNESS  OF  PRA  FOR  NASA 

The  PRA  has  the  ability  to  quantify  risk.  The  FMEA  methodology  does  not. 

Not  only  does  the  FMEA  process  ignore  quantification  in  general,  but  it  (by 
definition)  cannot  consider  "multiple  failure  modes."  In  principle,  such 
analyses  could  be  performed,  but  the  question  would  always  remain  whether  all 
reasonable  combinations  had  been  considered.  The  heart  of  a PRA  study  is  its 
"top  down"  methodology,  in  which  the  system  Is  dissected  and  quantified  free 
from  designer-level  prejudice. 

5.3  RISK  HIERARCHY 

The  PRA  analysis  yielded  an  effective  ranking  of  the  risks  relative  to  loss 
of  life  and/or  vehicle  due  to  MPPS  failure.  Sensitivity  computations  served 
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to  verify  the  internal  consistency  of  the  fault  tree.  The  PRA  methodology 
points  toward  an  objective  resolution  of  the  conflicts  of  traditional 
engineering  analysis.  If  the  results  of  the  PRA  study  are  disputed,  it  is 
necessary  to  identify  and  resolve  the  flaws,  either  in  the  fault  tree  or  in 
the  assigned  failure  rate  data.  The  bottom  line  is  that  the  study  provided 
an  explicit  quantification  of  the  risks  inherent  to  the  MPPS. 

5.4  KNOWLEDGE  CAPTURE 

Two  major  products  of  the  PRA  analysis  were  the  fault  tree  and  the  associated 
MPPS  system  description.  In  retrospect,  it  is  evident  that  both  of  these 
products  serve  a purpose  beyond  their  immediate  intent,  in  that  they  provide 
a vehicle  for  knowledge  transfer.  To  be  precise,  the  system  description  was 
generated  because  there  was  no  comparable  document  in  the  NASA  literature. 

It  organized  information  available  in  many  sources  into  a comprehensive 
system  description.  The  fault  tree,  originally  developed  to  support  the 
quantification  critical  to  the  PRA  procedure,  also  served  to  reinforce  the 
system  description.  These  products  capture  corporate  knowledge  far  beyond 
their  obvious  intent. 

5.5  COMPLEXITY  OF  THE  PRA  METH000L0GY 

Contrary  to  expectations,  the  PRA  methodology  proved  to  be  easily  understood 
by  the  technical  staff.  There  are  subtleties  that  require  specialized 
knowledge,  but  the  project  staff  had  no  trouble  in  absorbing  the  general 
technique.  This  is  especially  noteworthy  when  one  considers  the  diverse 
composition  of  the  engineering  and  analysis  teams.  In  actual  fact,  it  was 
found  that  the  PRA  analysis  process  provided  a common  forum  which  encouraged 
inputs  from  the  various  engineering  and  SR&QA  disciplines.  The  PRA  process 
demands  of  its  practitioners  a commitment  to  excellence,  and  all  members  of 
the  team  responded  to  the  challenge. 

5.6  THE  FAILURE  RATE  DATABASE 

* 

The  study  illustrated  the  inadequacy  of  the  extant  NASA  data  base  for  failure 
rate  data.  In  general,  the  problem  is  easily  described;  the  current  data 
reflects  failures,  but  without  quantification  as  to  time,  cycle,  cause,  or 
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detail.  On  the  other  hand,  the  MIL-SPEC  generic  data  bases  proved  quite 
adequate  to  the  quantification  of  the  MPPS  fault  tree.  This  indicates  that 
effective  analyses  can  be  performed  in  the  absence  of  Shuttle-specific  data, 
though  the  latter  is  clearly  preferred. 

5.7  SOFTWARE  CONSIDERATIONS 

The  computer  software  was  crucial  to  the  success  of  the  MPPS  analysis.  In 
particular,  the  CAFTA  fault  tree  analysis  program  allowed  easy  development 
and  manipulation  of  the  fault  tree.  The  Latin  Hypercube  Simulation/Top  Event 
Matrix  Analysis  Code  (LHS/TEMAC)  sensitivity  codes  allowed  the  PRA  team  to 
perform  computations  that  were  far  beyond  the  capability  of  hands-on 
calculation.  On  the  minus  side,  the  magnitude  of  the  MPPS  project  taxed  the 
CAFTA  program  to  its  limits.  It  is.  clear  that  software  support  is  necessary, 
and  that  studies  larger  than  the  MPPS  will  require  expansions  of  computer 
capability  (more  memory  and  better  program  integration). 

5.8  PRA  AS  A MANAGEMENT  TOOL 

The  immediate  results  of  the  MPPS  PRA  study  provide  a convenient  tool  for 
management,  in  that  the  resulting  risk  hierarchy  aids  in  the  allocation  of 
sometimes  scarce  engineering  resources.  Furthermore,  the  fault  tree  and  its 
associated  quantification  are  extremely  flexible  in  practical  application. 

For  example,  once  the  basic  fault  tree  and  risk  data  base  are  in  place,  it  is 
easy  enough  to  reflect  changes  in  the  MPPS  system,  simply  by  editing  the  tree 
or  data  base.  The  products  of  the  analysis  serve  as  a flexible  and  visible 
model  of  the  MPPS  system. 
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ABSTRACT 


f*  runcer  c ? safety  studies,  failure  mods 5 and  effects  analysis  (FMEiVs;  = nd 
h 3 z e ^ r 5 analysis  have  been  perforned  for  the  space  shuttle  main  croculsion 
preesur ira:  ion  system  tMFPS),  The  method  of  analysis  ir  eacr  of  tneee 
sv^iuetur,;  has  been  determini s t ic  : one  in  which  a source  of  hazard  is 

identified,  the  impact  is  determined  and  appropriate  :omrol  measures  are 
applied.  These  studies  are  exhaustive  and  comprehensive  m identifying  the 
credible  modes  and  mechanisms  of  individual  component  failures  and  in  assessing 
the  impact  of  tnese  failures  on  the  system.  The  studies,  however,  do  not 
account  for  the  effects  of  multiple  failures  occurmg  simultaneously,  nor  do 
they  quantify  the  likelihood  of  such  failures.  This  study  attempts  to  Quantify 
the  likelihood  of  a catastrophic  accident  by  utilizing  Probab l 1 i s t l c Risk 
Assessment  (PRA)  techniques.  The  results  of  this  study  identify  that  the  major 
contributors  to  catastrophic  failure  of  the  MPPS  are  associated  with  hardware 
leakage  and  rupture  which  result  in  explosion  or  aft  compartment 

overpressur i zai ion . Breach  of  pressure  boundary  is  the  direct  result  of  random 
seai/ueld/ joint  leakage  and  loss  of  component  structural  integrity. 
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Section  I 
INTRODUCTION 


In  January  iS87,  toe  National  Research  Council  Risk  Oversight  Panel 
recommenced  that  NASA  perform  a Probabi i i s t ic  Risk  Assessment  CFRA)  of  several 
Space  Shuttle  systems.  In  response  to  the  reconmendai ion , the  NASA  National 
Space  Transpor tat  ion  System  Program  Office  requested  that  Lockheed  Engineering 
and  Management  Services  Co.  \ LEMSCQ  ) perform  a PRA  on  the  Shuttle  Main 
Fropulsior  Pressunrat  icn  System  (MPPS).  The  intent  of  this  effort  was  to 
determine  if  any  areas  of  concern  not  previously  identified  by  the  FMEfi/HA  were 
uncovered  and  to  evaluate  the  usefulness  of  PRA  methodology.  This  effort 
parallels  a similar  task  currently  being  reformed  by  McDoneli  Douglas  on  tne 
Auxiliary  Power  Unit  < APU ) which  supports  hydraulic  power  generation  for  the 
Shuttle  Main  Engine  and  the  Flight  Control  System. 

Under  the  direction  of  LEMSCO,  Lockheed  Missiles  & Soace  Company’s  (LMSC) 
R&D  Division  was  commissioned  tG  perform  a comprehensive  evaluation  of  risk 
posed  by  the  MPPS  during  flight  and  preflight  phases.  Periodic  meetings  were 
held  between  NASA  and  LEMSCO/LMSC  to  further  define  the  scope  of  analysis  and  to 
discuss  specific  risk  issues  of  interest  within  the  MPPS. 

The  principal  purpose  of  this  study  is  to  quantify  in  probabi list ic  terms 
the  risk  which  the  Space  Shuttle’s  MPPS  poses  to  human  Life  and  property.  PRA 
is  the  analysis  technique  used  for  this  purpose.  A description  of  the 
historical  use  of  FRA  as  an  analytical  tool  and  a definition  of  MPPS  boundaries 
considered  within  the  scope  of  analysis  are  provided  in  the  following 
paragraphs . 


1.1  USE  CF  PRA 

PRA  is  a method  of  quantifying  the  probabilities  of  potential  accidents  and 
their  consequences.  PRA  employs  fault  tree  analysis  <F7A)  to  develop  and 
evaluate  a system  model  as  well  as  to  analyze  consequences  and  their  associated 
risks.  FRA  has  been  used  as  a technique  to  formally  address  these  risks  at 
nuclear  power  pLants  since  the  Reactor  Safety  Study,  WASH-14-80,  was  performed  in 
1975.  Prior  to  LASH-T400,  the  Boeing  Corporation  applied  PRA  to  an  evaluation 
of  the  Minuteman  missile,  The  aeorospace  industry  initially  viewed  PRA  as  too 
expensive  and  subsequently  replaced  it  with  non-probabi 1 istic  (i.e. 
deterministic)  methods  such  as  Failure  Modes  and  Effects  Analyses  (FMEAs)  and 
hazards  analyses  ( HAs  ) . These  were  the  tools  NASA  had  used  to  date  in  their 
analyses  of  the  risk  posed  by  Space  Shuttle  systems. 

Since  UASH-M00,  FRA  has  been  applied  to  many  other  industries  such  as 
chemical,  petrochemical  and  defense,  but  not  to  the  same  extent  as  in  the 
nuclear  industry.  Consequently,  the  methods  of  analysis  and  the  computer  codes 
used  to  solve  the  numerical  computations  for  this  study  were  adopted  from  the 
nuciear  industry  where  the  PRA  technique  is  most  mature. 

PRA  is  recognized  in  the  nuclear  industry  as  the  best  available  tool  for 
quantifying  the  frequency  and  severity  of  serious  accidents.  PRA  provide* 
information  to  support  a concerted  effort  to  identify  corrective  ( cr  preventive) 
actions  with  the  greatest  potential  to  reduce  overall  risk.  Nonetheless,  FRA  is 
not  a stand-alone  analysis  for  the  evaluation  of  risk;  a well-executed  PRA  is 
based  on  FMEAs,  hazard  analyses,  and  other  standard  design  activities. 
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Since  data  are  often  incomplete,  PRA  does  have  certain  limitations  which  nay  be 
summarized  as  fallows: 


G 


PRA  may  not  identify  all  events  that  could  start  or  direct 
the  course  of  an  accident.  In  addition,  there  is  no  test  of 
model  completeness  <i,e.  important  accident  scenarios  could  be 
unintentionally  omitted  by  the  analysist). 

Sufficient  and  reliable  data  may  not  be  available  to  model 
and  quantify  the  behaviors  of  system  and  accident  processes. 


o The  fault  tree  analysis  tool  utilized  in  PRA  may  be  simplistic  in  its 
represent  at  ion  of  system  level  behavior. 


These  limitations  do  no:,  however,  diminish  the  need  for  a probability- 
based  assessment  of  risk.  PRA  is  a systematic  approacn  tc  evaluating  risk  given 
the  information  and  understanding  available  at  the  tine  that  the  analysis  is 
performed.  In  effect,  PRA  is  an  attempt  to  determine:  What  can  go  wrong?  How 

likely  is  it  to  happen?  If  it  happens,  what  are  the  consequences? 

! . 2 COMPARISON  OF  PRA  JITH  OTHER  METHODOLOGIES 

Qualitative  techniques  such  as  FflEA  have  been  widely  used  in  the  aerospace 
industry  as  a means  to  identify  and  control  sources  of  risk.  The  FNEA  is 
essentially  a bcttows-up  approach;  each  component  or  subcomponent  is  analyzed 
for  its  failure  nodes,  causes  of  its  failure  and  the  effects  of  its  failure  on 
the  system  to  which  it  belongs.  For  example,  in  the  case  of  the  O-rings  in  the 
Challenger  accident,  the  effects  of  a leak  were  correctly  identified  as 
resulting  in  "high-temperature  gas  flow  burn-through  and  case  burst; 
catastrophic  failure  of  SRfl  (solid  rocket  motor)?  mission  loss;  vemcie  loss  and 
personnel  loss".  Nevertheless,  in  the  case  of  Challenger,  a cecision  was  made  tc 
launch  despite  the  existence  of  this  and  hundreds  of  other  identified  single 
point  failures. 


FMEAs  and  other  hazards  analyses  are  also  limited  m that  they  consider 
occurrence  of  only  one  failure  at  a time.  The  logical  connection  between  events 
and  systems  is  not  apparent  from  the  FffEA  documentation.  In  many  situations, 
subtle  interactions  between  various  systems  or  between  nan  and  machine  are 
missed  in  the  cons iderat  ion  of  individual  component  failures  in  the  FC1EA 
approach.  (See  Table  1-1  for  a comparison  of  the  advantages  and  disadvantages 
of  FMEAs  and  FTAs.)  Combinations  of  events  that  can  lead  to  failure  nay  have 
a greater  probability  of  occurrence  than  single  failures;  yet  the  FMEA  is  not 
designed  tc  address  combinations  of  failure. 

By  contrast,  the  FTA  is  a top-down  approach;  a top  level  event  is  first 
identified,  such  as  failure  of  the  fIPPS  which  resuits  in  loss  of  life  and/or 
vehicle".  Then  the  possible  failure  combinations  causing  this  event  are 
developed.  For  each  event,  contributory  events  or  chairs  of  events  are 
successively  developed,  until  arriving  at  the  basic  events,  which  are  usually 
single  component  failures  or  human  errors.  By  this  method  a downward  branching 
fault  tree  is  formed.  Figure  2-2  consists  of  the  top  branches  of  the  fault  tree 
generated  for  this  report.  Using  Boolean  "and"  "or"  logic,  the  total  probaoi- 
lities  of  various  failures  are  calculated  and  their  relative  contributions  tc 
the  total  r isx  are  assessed.  Refer  to  Figure  3-2  for  definitions  of  symbols. 


1.3  ORGANIZATION  OF  THE  REPORT 

This  report  is  divided  into  a main  report  (Volume  I)  arc  Appendices  A 
ihrougn  K which  are  contained  in  Volumes  II  and  III,  This  report  is  divided 
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into  a wain  report  (Volume  I)  and  Appendices  A through  K which  are  contained  in 
Volumes  II  and  III.  Volume  I contains  the  methods  by  which  the  analysis  was 
performed,  the  data  sources  used  for  probabilistic  quantification  , and 
descriptions  of  the  systems  and  events  included  in  tne  reliability  model. 

The  executive  summary.  Section  2.0,  discusses  the  major  conclusions  and 
recommendations  resulting  from  this  PRm.  Methodology  end  computat icnel 
techniques  are  presentee  m Section  3.0.  Quantitative  evaluation  cf  reliability 
is  performed  in  Section  4.  A briar  description  of  in-scope  systems  and  hardware 
is  presented  in  Section  S.  Risk  and  consequence  analysis  is  summarized  in 
Section  6. 

Appendices  contain  all  the  supporting  documentation  and  computation  for  the 
technical  evaluation.  A brief  description  of  the  Appendices  is  provided  below: 

Appendix  A:  A description  of  abbreviations,  acronyms  . initialisms  and 

terms  used  throughout  the  report, 

Appendix  B:  A description  of  fault  tree  basic  everts  and  the  shortened 

descriptors  (mnemonics),  along  with  a cross  reference  of 
pages  where  each  basic  event  appears  in  the  tree. 

Appendix  C:  Tabulated  failure  rates,  exposure  times  and  ether  supporting 

data  used  to  calculate  basic  evert  probabilities. 

Appendix  Q:  Detailed  fault  tree  showing  all  branches  expanded.  A 

description  of  each  of  the  branches  along  with  rationale  for 
the  fault  tree  structure  is  provided. 

Appendix  E:  Details  and  outline  drawings  for  major  system  components. 

These  drawings  supplement  system  descriptions. 

Appendix  F:  A discussion  of  fire  and  explosion  caused  by  leakage  and 

contamination.  General  discussion  to  illustrate  mechanisms  by 
which  ieakage  and  contaminat ion  can  cause  catastrophic 
failures. 

Appendix  5:  Ground  operations  and  tasks  which  are  required  during  ground 

fill  and  flight  preparation. 

Appendix  H:  Fault  tree  consistency  evaluation  to  cross  index  FMEA 

sequences  with  appropriate  portions  of  the  PRA  model.  This 
index  is  a comprehensive  review  of  all  FNEA’s  which  are 
related  to  the  main  propulsion  system. 

Appendix  I:  CAFTA  code  files  used  to  analyze  and  quantify  risk  are 

attached  to  a brief  synopsis  of  the  program’s  capabilities. 

Appendix  J:  Codes  used  to  test  statistical  sensitivity  using  Latin 

Hypercube  techniques  and  TEMAC  software. 

Appendix  K:  Comments  resulting  from  Lawrence  Livermore  National  Laborato- 

ry's independent  review  of  this  report  are  addressed  arc  the 
impact  of  the  comments  on  the  report  are  discussed. 

Each  of  the  tables  and  figures  m the  Appendices  are  supplemented  by 
accompanying  text  and  descriptions  of  their  use  within  the  main  report. 
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TABLE  1-1 

A COMPARISON  BETWEEN  FAULT  TREE  ANALYSIS 

AND 

CURRENT  SAFETY  ANALYSIS  TECHNIQUES 


FAULT  TREE  ANALYSIS 

CURRENT  METHOD 
(FMEA/HAZARDS  ANALYSIS) 

ADVANTAGES 

•DEDUCTIVE 

• EASILY  UNDERSTOOD  (INDUCTIVE) 

• ACCOMODATES  FAILURE 
COMBINATIONS/INTERACTIONS 

• SYSTEMATIC 

• PRIORITIZES  THE  PROBABILISTIC 
IMPORTANCE  FAILURES 

• SYS'ibrt  ORIENTED 

• COMPONENT  ORIENTED 

DISADVANTAGES 

• UNIQUE  PERSONNEL  TALENTS 
■ TIME  INTENSIVE 

• ROUTINE  APPLICATION  MISSES 
SUBTLETIES 

• MAY  OMIT 

- HUMAN  ERRORS 

- SECONDARY  FAILURE  ErrtsCTS 

- COMMON  CAUSE  FAILURES 

• TRADITIONALLY  LIMITED  TO 
SINGLE  FAILURES 

• OBSCURES  DEPENDENCIES 
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Section  1 

EXECUTIVE  SUMMARY 


In  post-Chai lenger  discussions  with  Congressional  Committees  and  the 
Naticrai  Research  Council  Risk  Management  Oversight  panel  , criticism  was  levied 
against  NASA  because  of  the  inability  to  prioritize  the  1300+  single  point 
failures.  In  the  absence  of  a ranking  it  was  difficult  to  determine  where 
special  effort  was  reeded  in  failure  evaluation,  in  design  improvement,  in 
management  review  of  problems,  and  in  flight  readiness  reviews.  The  belief  was 
that  the  management  system  was  overwhelmed  by  the  quantity  of  critical  hardware 
items  that  were  on  the  Critical  Items  List  and  that  insufficient  attention  was 
paid  to  the  items  that  required  it. 

Congressional  staff  members  from  Congressman  Markey’s  committee  who  hove 
oversight  responsi bi i i t les  in  the  nuclear  industry,  and  specifically  over  the 
nuclear  power  supplies  for  NASA’s  Galileo  and  Ulysses  missions,  felt  very 
strongly  that  the  addition  of  PRA  to  the  existing  Failure  Mode  Effects 
Analysis/Hazara  Analysis  (FMEA/HA)  methods  was  exceedingly  important.  They 
indicated  that  the  PRA  aoprcach  had  matured  to  the  extent  that  it  could  handle 
vary  small  failure  rate  data  bases,  such  as  that  maintained  by  NASA.  NASA 
responded  with  arguments  that  the  FMEA/HA  had  illuminated  ail  significant 
failure  modes  sat isfactor i ly  and  that  no  failure  rate  data  base  was  available. 

A compromise  position  to  evaluate  PRA  application  to  two  pilot  systems, 

MPFS  and  Auxiliary  Power  Unit  (APU),  was  suggested.  The  plan  was  to  6c  a PRA  on 
these  two  sub-systems  to: 

1.  Identify  areas  of  concern  not  previously  identified  by  the 
FMEA/HA  process. 

2.  Evaluate  the  usefulness  of  the  PRA  methoticlgy. 

The  plan  was  put  into  effect  and  has  resulted  in  the  Lockheed  FRA  effort  cn 
the  MPPS , With  regard  to  item  ft  above,  no  new  failures  or  combinations  of 
failures  were  identified  by  the  PRA  process.  This  result  is  not  unexpected  if 
one  considers  that  the  MPPS  is  a mature  system,  has  flown  repeatedly  after  a 
thorough  design,  development,  test,  and  evaluation  and  has  passed  through  a 
thorough  qualification  and  cert i f icat ion  program  - all  of  which  effectively 
detect  design,  manuf ac turing , and  inspection  weaknesses.  In  addition,  the  FMEA 
on  the  (1PPE  elements  is  equally  mature,  as  it  has  been  scrutinized  by  numerous 
contractors  end  issued  twice. 

The  selection  of  the  MPPS  was  perhaps  not  the  best  for  illustrative 
purposes,  since  it  contains  numerous  single  failure  points  <5FPs).  The  dominant 
risk  contr ibutors , therefore,  are  associated  with  the  individual  SrP' s,  rather 
than  with  the  combinations  of  failures  which  the  PRA  highlights  so  effectively. 

With  respect  to  item  #2  above,  the  usefulness  of  using  PRA  on  a shuttle 
sub-s/stem  is  effectively  demons tra ted . The  fault  tree  itself  provides  managers 
with  a Logical,  top-down  perspective  of  the  entire  system  during  all  mission 
phases.  The  quant i f l cat i on  of  the  various  events  cn  the  tree,  based  on  the  best 
generic  failure  rate  data  available  fin  the  absence  of  shuttle-specific  failure 
r*te  data?,  combined  with  a Monte  Carlo  treatment  of  u»k-c  t i«uniy  r yield  a 
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probability  range  for  the  loss  of  life  and/or  vehicle  due  to  failures 
originating  in  the  MPP5.  The  PRA  process  serves  as  an  excellent  cross-check 
against  the  FMEA’s,  building  upon  the  knowledge  of  component  failure  modes, 
causes  and  system  effects.  The  determination  of  a ranked  listing  of  failure 
contributors  provides  guidance  to  NASA  management  as  to  where  attention  must  be 
focused  to  reduce  risk.  This  ranking  is  seen  as  a useful  tool  in  a variety  of 
areas  involving  the  following  program  decisions: 

a)  Design 

b)  Failure  analysis 

c)  Selections  of  improvement  changes 

d)  Design  review  decisions  by  roanaaem*n+ 

e)  Readiness  reviews 

f)  Waivers 

g)  Spares  provisioning  plans 

h ; Material  review  board 

i)  Procurement  controls 

j)  Inspection  planning 

k ) Estaol ishment  of  critical  process  controls 

l)  Designing  test  programs 

m)  Execution  of  cost  benefit  analysis 

Some  specific  benefits  of  PRA  which  result  from  the  Lockheed  application  of 
PRA  to  the  riPPS  include: 

1,  General  failure  categories  in  the  top  branch  of  the  fault 
tree  are  highlighted,  thereby  providing  better  system 
insights  to  NASA  management, 

Z.  Inter-system  dependencies  and  interactions  such  as  this  are 
typically  omitted  from  NASA  FMEAs . An  example  is  the 
hydraulic  system  which  is  out  of  scope.  The  PRA  quantifies 
the  extent  to  which  the  pneumatic  system  is  challenged  by  a 
failure  in  the  hydraulic  system. 

3.  The  fault  tree  graphically  displays  the  limits  of  the 
analysis?  for  example,  contamination  and  ice  plugging  are 
not  treated  quantitatively  'because  of  a lack  of  data),  but 
do  appear  on  the  fault  tree  to  highlight  areas  of  future 
invest i gat  ions . 

4.  The  fault  tree  treats  combinations  of  failures  such  as  a 
tuo-out-of-four  criterion  for  the  hydrogen  and  oxygen 
depletion  sensors  and  a tuo“out-of-three  criterion  for  the 
flow  control  valves.  Common  cause  or  mode  failures  such 
as  these  are  treated  incompletely  or  not  at  all  in  NASA 
FMEAs, 

5.  The  fault  tree  incorporates  mission  phasing  by  considering 
the  multiple  consequences  of  failures  for  various  mission 
phases;  for  example,  incorpor at  ion  of  different  engine 
requirements  for  intact  abort  scenarios  allows  for  an 
assessment  of  system  reliability  over  the  entire  mission. 

In  contrast,  the  NASA  FMEAs  typically  provide  only  the  worst 
case  effect  of  a component  Failure  on  me  system  ratner  tnan 
a more  realistic  assessment  of  consequence  sensitive  to 
system  modes  and  conf  igurat ions . 
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S * The  fault  tree  offers  a terse  presentation  of  the  paths 

which  lead  to  the  top  event  catastrophy  "loss  of  life  and/or  ■ 
vehicle  due  to  failures  in  the  MPPS'* . The  only  sequences 
which  appear  are  those  which  lead  to  failure,  and  these  are 
contained  in  slightly  more  than  150  pages.  In  contrast , the 
NSAS  FMEAs  contain  many  more  pages,  as  they  include  items 
which  do  not  either  by  themselves,  or  in  combination,  lead 
to  loss  of  life  and/or  vehicle. 

In  conclusion,  PRA , coupled  with  sensitivity  analyses,  provides  a 
ranking  system  with  which  to  assess  NASA  systems  throughout  their 
life.  It  is  important  to  apply  PRA  early  in  system  life,  when  chang** 
can  be  effected  at  minimal  cast.  It  is  also  important  to  develop  a 
failure  rate  data  base  or  program  hardware  to  enhance  the  realism  and 
credibility  of  the  PRA. 

It  is  therefore  recommended  that  PRAs  be  utilised  at  the  beginning  of  new 
NASA  programs , and  that  selected  high  energy  systems  on  space  shuttle,  where 
catastrophic  failures  can  be  generated,  be  considered  fcr  a retro-active 
appi icat ion  of  FRA. 

2.1  SIGNIFICANT  FINDINGS 

Table  2^1  shows  a compositional  breakdown  of  events  leading  to  loss  of  life 
and/or  vehicle.  The  events  have  been  grouped  into  the  following  general 
categories i 

1 . Exp i os ion/eompar tment  ovei — pres suri cat  ion  , 

2.  Ua 1 ve-related  failures, 

3.  Turbopump  failures  , 

4.  Loss  of  Pogo  suppression  system, 

5.  Loss  of  propellant  system  screens,  and 

5.  hi scsl 1 areous . 

These  c iasa i f i cat  ions  were  based  on  the  top  level  fault  tree  moael  for  the 
MPPS  presented  in  Figure  2-2.  Details  regarding  the  development  of  lower 
branches  in  the  tree  are  provided  in  Section  3. 


2.1.1  Risk  Contributors 

I.  Catastroohic  Explosions  and  Overpres sur l rat l on  Events 

The  single  largest  category  of  catastrophic  failures  is  that  associated 
with  the  random  oreach  of  mechanical  system  pressure  boundary.  This  includes 
release  of  material  through  either  the  propellant  p ip lng/componen t s cr  the 
helium  ( He  ) pneumatic  system.  At  the  individual  component  level  the  failures  in 
the  High  Pressure  Oxidizer  Turbcpump  < HFOT ) heat  exchangers  and  turbcpumos  are 
major  contributors.  Collectively,  however,  the  nunercus  other  weld  joints, 
seals,  fittings  and  mechanical  connections  ( tnrough  uhicr  gross  leakage  could 
occur;  are  the  most  significant  factor. 
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The  mechanism  for  these  catastrophic  failures  varies  depending  on  the  type 
of  Material  released.  For  helium  system  depresurizat ion . the  primary  effect  is 
compartment  overpressur an . Helium  is  an  inert  gas  incapable  of  igniiicTn. 
TTfe  impact  of  gross  leakage  or  component  rupture  on  the  Space  Transpcrtat ten 
System  (STS)  is,  therefore,  mainly  one  of  structural  damage  to  the  orbiter  (if 
vent  panels  in  the  aft  compartment  cannot  compensate  for  the 
overpressunoat  ion  > , 

The  accident  consequence  cf  breaching  the  propellant  system  piping  end 
components  is  immediate  explosion.'  "Most  of  these  leaks  occur  within  the  orbiter 
aft  compart  men  t”^re"sui  l iny'Tn  either  an  immediate  explosion  or  overpressur  i zat  ion 
of  the  compartment.--  Imm^d****  explosion  would  be  the  result  o T cryogenic  fluid 
c □ njtfictipg  v e t sd  temperature  sources.  Overpressunzat  ion  is  the  primary 
accident  consequence  when  immediate  ignition  sources  are  not  present  in  the 
vicinity  of  leakage  (e.g,  gaseous  oxygen  pressurizat ion  line).  That  is,  gradual 
or  rapid  depressun cat  ion  leads  to  structural  damage  of  the  aft  compartment  if 
pressure  relief  is  not  achieved. 

2.  lvf3 1 ve  Related  Functional  Failures 

The  most  important  valve-related  failures  are  those  which  constitute 
single  point  f ai lures ■ Functionally  redundant  valves  which  operate  independently 
of  each  other  (through  separate  control  signals,  power  supplies,  pneumatic 
supply  etc.)  contribute  minimally  to  overall  risk. 

The  hleed--ya4-v*^and  anti-flood  v*  Ives  contribute  significantly  to  the  top 
event  occurrence  within  this  category  of  failures.  In  the  helium  system,  flow 
regulators  comprise  the  most  important  functional  failures.  Other  system  valves 
such  as  external  tank  pressuricat ion  flow  control  valves  and  External  Tank 
( ET- ) /orbiter  disconnect  valve  failure  are  minor  contributors  to  risk. 

3.  Turbooumo 

Turbopump  failures  associated  with  the  MPPS  are  primarily  caused  by  leakage 
through  mechanical  seals, 

4.  Loss  of  Poqo  Suppression  System 

The  valves,  piping,  and  accumula tors  which  comprise  the  POGO  suppression 
system  account  for  less  than  one  percent  of  the  total  failure  probability. 
Failure  to  regulate  low  frequency  oscillations  is  assumed  to  cause  structural 
damage  to  the  STS  and/or  loss  of  life, 

5.  Loss  oF  Frooellant  Svstem  Screens 

Break-apart  or  tearing  of  propellant  screens  (located  downstream  of  engine 
pre-v*ives>  15  assumed  to  cause  pump  binding.  Fragments  of  the  screen  will 
destroy  the  turbcpump  on  impact.  The  likelihood  of  these  single  point  Failures 
col lect i veiy  amount  to  less  than  one  half  of  one  percent  of  the  total  failure 
probab i i i ty , 

6.  Mi  see i laneous 

Remaining  events  contribute  negligibly  to  overall  risk.  This  category 
consists  primarily  of  spurious  actuation  or  control  circuits  and  other  in-sccpe 
portions  of  the  electrical  ms trumer t a t i on  and  controls. 
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2,2  RECOMMENDATIONS 

The  key  to  minimizing  the  likelihood  of  catastrophic  accidents  is 
controlling  ignitable  leaks  and  sources  of  compartment  overpressuricat ion. 

Breach  of  pressure  boundary , whether  the  result  of  random  failure  or  human 
error,  are  expected  to  account  for  more  than  four  fifths  of  the  total  risk.  In 
comparison,  failures  strictly  related  to  the  functional  performance  of  the 
engine  safe  engine  shutdown  capability)  constitute  a snail  fraction  of  a 

percent  of  ail  events  leading  to  catastrophic  accidents. 

These  percentages  represent  preliminary  findings  based  cn  issues  adcressed 
as  part  of  the  scope  of  this  analysis.  Other  risk  scurry  associated  wit n the 
jv*pp5  were  not  included  in  the  analysis  and  require  further  investigation.  A 
partial  list  of  risk  sources  not  addressed  or  probabilistically  quantified  ir 
this  study  is  contained  in  Section  Z.G. 

Addition  of  functional  redundancy  will  not,  in  general,  signi f leant iy 
reduce  overall  risk  because  additional  piping  and  components  containing 
propellant  , hydraulic  oil  or  high  pressure  helium  uiLi  contribute  more  sources 
of  fire  and  explosic n.  It  is  therefore  recommenced  that  efforts  be  directed 
towards  controlling  direct  sources  of  explosion  or  those  leakage  ana  rupture 
events  which  lead  to  an  explosion.  Table  2-2  contains  reconnendat ions  based  on 
the  PRA,  additional  areas  requiring  further  investigat icn  are  itemized  in  Table 


2,2.1  Prevention  of  Explosion  and  Overpressuriaat ion  Scenarios 

Explosion  and  overpressuricat ion  events  were  quantified  based  on  generic 
data  for  component  ruptures r seal  failures  and  other  leakage  terms.  The  data  is 
based  on  reported  failures  for  environments  and  applications  similar  to  that  of 
the  STS . However,  leakage  on  the  STS  may  have  a lower  frequency  of  occurrence 
than  that  reported  in  the  data  book  due  to  the  increased  level  of  inspection  of 
hardware.  Similarly,  early  detection  of  the  leak  may  preclude  catastrophic 
explosions  under  certain  scenarios.  A brief  discussion  is  provided  below. 

Inspections  Between  Flights 

A comprehens i ve  invest igat ion  of  the  accuracy  and  consistency  of 
nondestructive  testing  (NOT)  is  recommended.  The  investigation  should  include 
1 ) a review  of  human  reliability  in  performing  the  tests  and  detecting  potential 
flaws  and  2)  a statistical  assessment  cf  the  accuracy  of  the  test  performed  in 
actually  detecting  potential  fLaws. 

The  test  types  presently  being  utilized  between  flights  involve: 


0 

Ultrasonic  extensiometer 

0 

Ultrasonic  leak 

o 

Optical  leak 

0 

Laser  in ter f eromet ry 

0 

D i f f erent  aal  radiometry 

0 

Holographic  leak 

o 

Resistivity  monitoring 

G 

Halogen  leak 

O 

flow  leak 

0 

Mass  spectroscopy 

a 

fhermai  leak 
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o Torquing 

o Leak  fluid 

0 Pressure  decay 

o Isotope  t hermorcet ry 

.c  Isotope  tracers 

o Gorescoping 

c Exoelectron  emission 

o Positron  annihilation 

o Electric  currant  injection 

o Eddie  current 

o Continuity  checking 

o X-ray  radiography 

o Polerometry 

c Hygrometer 

o Optical  Pyrene try 

Other  specialized  flow  detection  methods  nay  also  be  included. 


Leak  Get  sc  t i cn 


Sophisticated  detection  cf  po:ent;ai  thermal  shock  conditions  or  early  lea*, 
sensing  prior  to  SRB  ignition  is  critical  to  accident  prevention.  Following  $RB 
ignition,  efforts  should  be  focused  on  in-flight  leakage  and  high  pressure 
turbopump  cavitation  prevention.  In  the  PRA , high  pressure  turbopump  cavitation 
is  assumed  to  result  in  pump  explosion. 

Pump  cavitation  detection  must  be  responsive  to  the  relatively  short  time 
beiween  transient  initiation  and  pump  explosion.  Currently  existing  parametric 
sensing  such  as  pump  suction  pressure  droo , excess  vibration  of  the  punp  body, 
pressure  fluctuations  throughout  the  propellant  system  and  ullage  pressure  may 
represent  only  a small  number  of  detection  schemes. 

Z.2.2  Failure  Rate  Data  Ease  Ceveiooment 

The  computed  top  event  probability  depends  on  basic  event  failure  rates. 
Failure  rate  information  for  the  STS  hardware  was  found  to  be  fragmentary  and 
incomplete.  This  required  that  generic  data  be  used  to  supplement  STS  specific 
failure  cata. 

It  is  strongly  recommended  that  a failure  data  collection 
established  to  facilitate  future  PR A and  re-design  activities, 
components  of  this  consolidated  data  base  should  include  (as  a 
fol lowing  informat  ion: 

c Hardware  name/descri pt ion  (e.g.  unigue  identifier) 

o Hardware  type  (i.e.,  pneuma t i ca 1 iy  actuated  hydraulic  valve, 
t urb  apunp  , etc.) 

c Failure  history  ( l . e . , time  of  failure,  number  of  test  hour* /cycles , 
time  between  failures,  reoair  time) 

■o  Each  test  must  oe  describee  m sufficient  detail  so  ‘hat  ;;s 

significance  for  the  estimation  of  the  probability  of  failure  unner 
operational  conditions  can  be  determined 


system  be 
The  main 
minimum ) t he 
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o Reason  for  any  testing  that  is  not  pert  of  the  preplanned  test  program 
c Failure  mode  Rescript lon/rcoi  cause  evaluation 
o Hardware  duty  cycle  and  operating  nodes 


Any  ST5  failure  rate  data  wnich  can  be  compiler  can  be  used  to  update 
earlier  estimates  of  performance  based  on  generic  data.  It  is  important  to 
note,  however,  that  the  cbservea  failures  nay  constitute  a statistically  snail 
sample  for  analytical  purposes.  Caution  should  be  taken  to  establish  the 
confidence  interval  when  few  failures  are  recorded  or  when  few  operating  hours 
without  failure  have  been  observed.  Also,  test  data  must  be  treated  differently 
from  actual  operating  data. 


The  manpower  required  to  implement  this  failure  rate  data  base  will  be 
dependent  on  initial  setup  efforts.  Most  of  the  cost  will  be  incurred  during 
data  base  development  and  installation.  Once  in  place,  reporting  and  record 
updating  should  average  2-4  hours  per  failure  incident  plus  periodic  updates  to 
record  the  operating  log  time  for  those  components  wnicb  have  not  experienced  a 
failure.  Such  updates  primarily  involve  data  transcript  ion  and  require 
significantly  less  than  one  hour  of  effort  per  component.  Qata  transfer  option 
from  contractor  maintenance  or  maintained i 1 i ty  data  bases  should  be 
investigated. 


2.Z.3  Improvement  of  Documentation  System 

It  was  the  concensus  among  persons  contributing  to  this  study  that 
NASA’s  documentation  system  (for  technical  analyses,  drawings  and  reports 
regarding  the  MPPS  and  other  systems)  requires  substantial  improvement.  This 
is  particularly  important  in  the  following  areas:1 

c Centralisation  of  Technical  Data:  Collection  of  the  appropriate 

document s/drawings  to  perform  this  study  was  time  intensive.  The 
necessary  documentation  had  to  be  obtained  from  a variety  of  NASA 
organ  zat  ions  and  subcontractors . No  central  coordination  cf  such 
cocuments  was  found. 

a Document  Control:  A number  cf  factual  inconsistencies  were  identi- 

fied in  and  between  the  various  documents  utilized  in  this  study. 

The  proper  control  of  governing  documents  is  essential  to  the 
accuracy  of  the  PRA  results. 

o Quality  of  Documents:  In  a number  of  instances,  particularly  those 

relating  to  ground  operation,  the  reproduction  quality  was  extremely 
poor,  Better  copies  were  often  unavailable  or  non-existent  within 
NASA’s  documentation  system. 

Future  PRA  and  other  safety  studies  can  be  more  effectively  and 
efficiently  performed  with  the  improved  availability  and  quality  of  tecnmeal 
document  s . 
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2.3  STUDY  LIMITATIONS 

The  fsult  tree  node!  represents  failures  which,  in  themselves T or  m 
combination  with  other  events,  result  m occurrence  of  the  tcp  event, 
initially,  the  fault  tree  top  event  is  calculated  based  on  point  (single  value) 
estimates  for  basic  event  probabi 1 i t : es . The  probafci lit xes  are  based  on  the 
coir.t  estimate  cf  a statistically  significant  sample  of  recorded  failures.  Each 
population  of  failed  components  has  an  associated  di 2 tri bu t i on . This 

information  is  lost,  however 9 when  only  the  point  estimate  is  used. 

Several  methods  are  available  to  incorporate  uncertainties  about  basic  event 

probabilities  into  a calculation  of  the  top  event  probafcil ity.  This  can  be 

accomplished  by  Synthetic  sampling,  a variant  of  which  i*  used  in  this 
analysis,  A less  sophisticated  but  simplier  alternate  method  used  to  determine 
too  event  variance  is  by  evaluating  sensitivity.  Sensitivity  is  tested  by 
varying  specific  basic  event  probabilities  while  maintaining  others  constant. 

Tn  this  manner,  a range  of  top  event  probab i 1 i t ies  can  be  generated,  thus 
bracketing  'worst"  and  "best'*  case  conditions.  Information  regarding  Synthetic 
sampling  and  o trier  sensitivity  techniques  is  provided  in  Section  3, 

The  generic  failure  data  is  based  on  field  exnenence  with  a population  of 
well-maintained  components  and  systems  assumed  to  be  in  the  useful  midlife 
performance  range-  The  degraded  reliability  cf  parts  due  to  uear-eut  , limited 
iife,  cr  fatigue  is  not  a part  of  the  analysis  because  aerospace  parts  are 
assumed  to  be  properly  inspected,  tested,  and  maintained  prior  to  launch. 
Furthermore,  the  failure  data  from  which  the  probab; 1 i t ies  were  derived  are 
based  on  experience  with  aerospace  missile  and  satellite  components. 

Information  regarding  equipment  duty  cycle,  modes  of  operation  and  environment al 
stresses  provides  at  most  a “best  estimate1'  of  expected  hardware  performance  on 
the  STS.  Much  of  the  uncertainty  arises  due  to  the  translation  of  ‘pet — hour1* 
failure  rate  data  (much  of  which  is  expressed  in  failures  per  millions  hours  or 
operation)  into  a “per  demand"  or  cyclic  failure  rate.  This  is  a particularly 
difficult  problem  in  the  derivation  of  failure  probability  values  for  equipment 
required  to  operate  in  different  nodes  during  the  various  launcn  phases. 

Latent  failures  are  considered  out  of  scope.  Although  latent  failure  rates 
are  generally  i ns l gni f icant  compared  with  post  launch  failure  rates,  the 
cumulative  latent  period  for  the  total  of  all  components  under  evaluation  will 
add  to  overall  risk.  In  addition,  ice  plugging  and  contaminat i on  are  excluded 
frcm  the  FRA  as  out  of  scope  because  data  were  unavailable. 


2.4  COMPARISON  OF  MPPS  PRA  TO  EARLIER  STUOIES  AND  TESTS 

The  value  of  a PRA  depends  heavily  on  the  understanding  of  accident 
sequences,  and  their  subsequent  quant i f icat ion.  A review  of  previous  analytical 
evaluations  was  crucial  to  both  model  development  and  quantification.  A brief 
description  of  some  cf  the  documents  examined  during  the  course  of  this  study  is 
included  beiow. 


A number  of  previous  safety  and  functional  studies  have  been  perform* 
rinSA  contractors  to  assess  the  potential  risks  associated  wit n the  MPPE . 
of  these  studies  used  in  this  FRA  are  provided  in  Table  2-4.  These  5tudif 
provide  oniy  a cater  mini  at  ;c  or  qualitative  assessment,  cf  potential  r i a r.  s . 
croDaoiii ties  have  been  assigned  to  the  postulated  accident  sequences. 


hy 
f ew 

No 
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The  MPPS  FMEAs  (Refs.  8,  9,  IS  and  30',  were  the  most  important  guides  to 
understanding  component  failure  Modes,  causes  and  ays cam  effects.  This 
information  was  also  valuable  in  identifying  certain  failures  which  may  be 
induced  by  human  errors  in  the  man-machine  interfaces.  The  F MEA ’ s are 
particularly  useful  to  PRA  in  that  they  can  be  used  to  verify  the  accident 
sequences  generated  in  the  risk  model. 


comprehensive  review  of  FMEA  single  point  failures  was  performed 
ensure  tnst  the  safety-related  FMEA  items  were  properly  included  in  the 
models  the  result  inn  --orsistency  check  is  documented  in  Appendix  H.  A 
description  end  cross  index  between  the  FMEA  single  point  failures  and 
tree  basic  events  are  provided  to  facilitate  this  cross  check.  More  di 
on  details  regarding  the  risk  model  ere  provided  in  section  -l. 


to 

r i s !; 


fault 

scussicn 


2.5  SUMMARY  OF  ANALYTICAL  APPROACH 


The  analysis  is  based  on  a deductive  logic  procedure  called  fault-tree 
analysis,  ft  fault  tree  is  a graphical  representation  of  all  conceivable 
accident  sequences  wnich  can  laari  to  a system  level  catastrophy.  The  fault  tree 
model  consists  Of  hardware  failure,  human  error  and  environmental  contributors 
to  the  system  level  caiastrophy. 


In  risk  analysis,  the  top  event  is  typically  a system-level  accicent  such 
as  "loss  of  life  and/or  vehicle".  The  undesirable  top  event  is  successively 
reduced  to  a combination  of  lesser  failures  represented  in  the  lower  branches. 
The  lowest  events  depicted  in  a fault  tree  are  represented  by  rectangles, 
circles,  and  diamonds.  The  diamond  is  used  tc  indicate  an  event  which  could  be 
further  reduced  but  which  is  not,  to  simplify  the  depicted  fault  tree 
structure.  The  individual  failures  which  are  not  further  reduced  (basic  events) 
are  represented  by  circles.  The  rectangle  is  used  to  indicate  an  intermediate 
event  tc  be  further  reduced  to  basic  events.  The  triangle  is  used  to  indicate  a 


continuation  of  the  fault  tree, 
symbols  and  terminology. 


igu; 


re  3-2  contains  a depiction  of  fault  tret 


3oo 1 ean  algebra  is  used  to  depict  the  relationships  amongst  the  failures. 
The  'and"  gate  indicates  the  events  necessary  to  produce  the  next  higher  event 
in  the  tree.  The  "or“  gate  indicates  all  events  such  that  any  one  of  which  is 
sufficient  to  produce  the  next  higher  event. 


In  addition  to  the  calculation  of  top-event  prcbabi 1 i t ias  , cutsets  are 
generated.  A cutset  is  a collection  of  basic  events  sufficient  to  cause  the 
occurence  of  the  top  event.  A minimal  cutset  has  the  property  tha.  no  proper 
subset  of  it  is  also  a cutset.  The  collection  or  minimal  cutsets  provides 
gualitative  information  about  the  vulnerability  of  the  system.  In  the  absence 
of  failure  data  it  can  be  said  that  the  vulnerability  of  a system  increases  as 
the  cutset  size  decreases  and  the  cutset  r.unoer  increases. 


The  dominant  cutsets  identified  in  the  FTA  are  then  used  to  quantify  event 
tree  branches.  An  evert  tree  is  a success/ f di Lure  node!  defining  the  pa^^ible 
outcome  or  consequence  states  based  on  sequeniial  or  tine  dependent  conditions* 
This  defines  the  rime-phased  risk  as  weLI  35  recovery  factor?  (such  as  abort 
sceneries  > tuhich  cannot  fee  easily  depicted  on  a single  fault  tree  re  del. 

Details  regarding  FTf\  and  event  tree  development  are  provider  in  eec.iuns  4 and 

6. 
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2.S  SCCPE  OF  ANALYSIS 

The  3COCe  of  t.ns  FRA  includes  the  MPPS  and  major  portions  of  associated 
support  systems.  The  MPPS  consists  of  the  E7 , aspects  cf  the  Space  Shuttle  Main 
Engine  CSSME),  and  those  components  of  the  Orbiter  which  connect  the  ET  to  the 
SShE  and  provaae  the  necessary  services  for  proper  MPPS  Functioning* 

Sround  operations  commencing  at  eight  hours  prior  to  launch  were  examined 
for  impact  cn  the  launch  but  not  Quantified  in  the  analysis.  Table  1-3 
contains  seme  af  the  major  risks  not  included  in  this  PRA. 

2*6*1  3asis  for  Inclusion  of  Events 

The  MPPS  15  an  integral  part  of  the  Mein  Propulsion  System  (MP5).  It  is 
not,  however,  a separate  and  distinct  system  with  strictly  defined  boundaries 
and  interfaces*  Reference  to  a "system*  is  merely  a convention  which  recognizes 
a requirement  within  the  MPS  for  gaseous  pressurants.  Thus,  the  MPPS  is  defined 
herein  for  the  specific  purpose  of  performing  a PRA.  The  "system11  is  comprised 
cf  various  MPS  pressure-rels ted  functions  with  interfaces  induced  for 
analytical  ccmp ieteness * The  study  participants  recognize  that  there  may  be 
differing  definitions  cf  the  MPPS,  based  on  historical  boundaries  and/or 
contractor  responsibilities. 

The  scope  of  the  MPPS  for  the  purpose  of  this  FRA  was  based  on  various 
analytical,  as  well  as  engineering  cons i derat  ions . As  a general  rule,  an 
element  is  in  scope  if  its  failure  directly  fails  an  element  of  the  MPPS  or  if 
it  is  directly  failed  by  an  MPPS  element,.  Out  cf  scope  elements  include 
component  failures  outside  the  MPPS  which  lead  directly  to  loss  of  life  and/or 
vehicle  and  for  which  subsequent  failure  cf  the  MPPS  is  irrelevant.  The 
interconnection  of  piping  and  control  systems  necessitates  that  subsystems  which 
interface  directly  with  gaseous  pressurants  be  included  within  scope  for  this 
PRA*  Interactions  between  systems  and  spatial  dependency  of  major  components 
can  cause  failures  in  the  MPPS  which  ultimately  lead  to  loss  cf  life  end/or  STS 
vehicle.  A general  itemization  of  hardware  and  human  activities  considered 
in-scope  end  cut -of “scope  is  provided  in  Tables  2-5  and  2-5,  respectively. 


2.5*2  Specific  Scope  Boundaries 

ft  number  of  scenarios  involving  interfaces  with  the  MPPS  must  be  examined 
for  analytical  completeness.  Evaluation  of  failures  occurring  strictly  within 
the  MPPS  and  affecting  only  the  MPPS  hardware  address  only  a small  fraction  of 
total  risk  contr ibut ions  to  the  STS. 

The  general  categories  of  items  considered  within  analytical  scooe  may  be 
summarized  as  follows: 

1 4 Events  affecting  MPPS  pressur i cat  ion  functions.  This 
category  includes  all  failures  which  lead  to  loss  of 
pre-pr as sur i cat  ion  and  re-pressur  izat  ion  functions, 
ndditio rally,  this  category  includes  any  and  all  events 
which  create  insufficient  ullage  pressure  conditions  \n  the 
external  propellant  tanks. 
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All  portions  of  the  closed  process  loop  containing  the 
MPPS.  I nterconnect ions  with  wain  engines,  orbitar  pioing 
and  external  propellant  tanks  create  a single  pressure 
boundary  and  flow  loop.  Loss  of  pressure  bouneary  cr  F Lou 
through  the  closed  loop  necessarily  affects  MPPS  function. 

Hardware  and  human  actions  which  directly  support  MPPS 
narcuare  cr  in-scope  hardware  defined  by  2).  This  category 
includes  local  electrical  control  signals,  pneumatic 
(helium)  system  lines  and  components,  and  hydraulic  lines 
directly  suopcrting  main  engine  valve  hydraulic  actuation. 
Excluded  from  this  category  are  ail  control ler/general 
purpose  computer  failures  and  all  hydraulic  supply  and  serve 
control  failures  (including  hydraulic  control  of  yaw 
end  pitch  functions). 

Elements  contained  clearly  within  the  MPPS  boundaries. 

These  include  flow  control  valves,  HPQT  heat  exchangers, 
heet  exchanger  bypass  Flow  orifices  and  gaseous  oxygen  <602) 
and  gaseous  hydrogen  CGH2)  pressuricat ion  lines  and 
components. 

Events  that  challenge  the  MPPS  , requiring  a response.  For 
example,  MECO  requires  closures  of  the  prevalves  which  are 
in  scope*  The  duty  cycle,  process  conditions  and  environ- 
mental stresses  greatly  affect  the  ability  of  the  MPPS  to 
perform  its  pressur 1 zat icn  functions.  By  evaluating  the 
impact  these  factors  have  on  the  MPPS  , one  can  establish  the 
number  of  valve  actuations,  pressure  transients  or  flow 
restrictions  that  the  MPPS  hardware  will  experience.  Those 
influences  are,  for  the  most  part,  external  to  the  MPPS. 

Events  which  define  MPPS  success  criteria.  These  are 
primarily  functional  failures  in  the  main  engine  which 
establish  whether  the  MPPS  can  function  under  specified 
conditions.  For  example,  loss  of  more  than  two  engines, 
aside  from  failing  to  provide  proper  thrust,  may  also  result 
in  insufficient  ullage  pressure. 

Crew  or  ground  control  actions  which  cause  or  mitigate  MPPS 
failures  or  failure  of  hardware  defined  by  2),  3),  and  4). 
These  are  exclusively  errors  of  omission  or  failures  to 
respond  when  required.  Errors  of  commission  which  induce  a 
failure  are  not  within  the  scope  of  analysis. 

Miscellaneous  hardware  included  for  analytical  completeness 
and  tc  account  for  symmetry  between  subsystems.  Issues  of 
symmetry  arise  frequently  when  comparing  the  Liauid  oxygen 
i LOZ  ) and  liquid  hydrogen  <LK2)  propellant  systems.  Tne 
HPOT  preburner  has  an  internal  heat  exchanger  which  provides 
a pressur irat ion  function  within  the  MPPS.  The  HPOT 
preburner , therefore,  is  in-scope.  The  Hign  Pressure  Fuel 
Tjrbopump  (HPFT)  has  no  analogous  neat  exenanger  on  its 
preburner  , but  the  HPFT  qreburner  is  induced  within  scope 
for  analytical  completeness. 
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Table  2-2 

Recommendations  Based  on  the  Probabilistic  Risk  Assessment 


Risk  Source 

Recommended  Action 

Leakage  through  Seals 
In  Components  and 
Piping 

Seal  leakage- related  failure  rates  are  two  orders  of  magnitude 
higher  than  welded  connection  failure  rates.  An  evaluation  of 
whether  welds  can  be  used  In  lieu  of  flanged  connections 
( involving  O-ring  or  other  seals)  should  be  performed.  The 
trade-off  of  this  modification  In  that  one  sacrifices  serviceability 
when  connections  between  components  and  piping  ere  welded 
together. 

Release  of  Ignitable 
Materials  into  Aft 
Compartment 

Evaluate  the  addition  of  an  in-flight  leak  detection  system  in  the 
aft  compartment.  The  leek  detection  system  can  be  used  as  a 
shutdown  parameter  input  to  the  engine  controller. 

Depressurization  of 
Helium  Pneumatic 
System  in  Aft 
Compartment 

Evaluate  the  *d??pjacy  of  the  aft  compartment  vents  in  relieving 
overpressurization  conditions.  Of  specific  concern  are  scenarios 
In  which  high  pressure  helium  supply  system  pressure  boundary 
is  breached. 

Bleed  valve/antiflood 
valve  failures 

Evaluate  options  to  ensure  that  thess  valves  assume  their  proper 
position  during  flight.  Increased  functional  testing  prior  to 
launch  preparation  and  cryogen  detection  In  bleed  line  may 
prevent  overpressuri2at1on  explosion  event  during  engine  start, 
although  it  Is  recognized  that  excessive  functional  testing  may 
actually  degrade  reliability. 

Pneumatic  System 
Pressure  regulator 
failure 

! 

Evaluate  options  to  automatically  isolate  the  regulator  from  the 
downstream  system  upon  a high  pressure  detection  via  on 
overboard  vent. 

All  other  failures 

No  action  recommended  as  the  failure  rates  are  sufficiently  low  to 
contribute  negligibly  to  overall  risk. 

i 

TABLE  2-3 


SOURCES  OF  RISK  EXCLUDED  FROM  PRA  INVESTIGATION 


6£M£PA L 

• External  events  ( specifically  natural  phenomena  such  as  lightening  and  strong 
winds). 

• Propellant  hydrodynamic  transients. 

• Structural  failure  of  ET  under  dynamic  loadings. 

• Spatial  interactions  between  structural  components. 

• Latent  flaws  and  common  cause  failures  Introduced  during  repair  and 
refurbishment. 

• Common  cause  failures  of  sensors  which  occur  during  flight  due  to  power  supplies, 
control  systems  and  other  hardware  interactions 

• Piping  and  tubing  failure  mechanisms. 

• End-of-Llfe,  wear-out  and  fatigue  characteristics  of  major  mechanical  components. 

• Valve  sequencing  failures. 


SPECIFIC 

• Turbine  blades  on  high  pressure  pumps  should  be  examined  for  potential  redesigns  to 
prevent  missile  generation. 

• Yaw  and  pitch  control  subsystem's  ability  to  compensate  for  loss  of  a single  angina 

• Detailed  evaluation  of  controller  and  engine  interface  unit  internal  architecture. 

• Thermal  shock  In  piping  downstream  of  S5ME  prevalves. 
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TABLE  2-4 

Summary  of  Previous 
SSMP  Risk-Related  Studies 


■ 

Type  of  Study 

Reference 
(Section  7) 

information 
Conieinsd  in  Study 

Element  Interface 
Functional  Analysis 
CEIFA) 

7 

o Qualitative  analysis  of  systems  and  their  interfaces. 

© Examines  effects  of  failure  in  one  system  on  other  relatec 
systems. 

o Also  Identifies  non-redundent  failure  points  and  assigns 
criticality  levels  (1-3)  to  items. 

FMEA 

8 Externa)  Tank 

9 SSME  & Critical 
Items  List 

18,30  Orbiter 

o Qualitative  analysis  of  failure  modes  and  effects. 

Hazard  Analysis 

10  External  Tank 

1 1 SSME  design  - 
operational  flights 

12  L02  Control 
System 

1 3 LH2  Control 
System 

0 Summary  of  applicable  hazards,  precautions  ana  remeaU 
in  a system,  identifies  hazards,  controlled  and 
eliminated.  Evaluates  systems  end  responses  on  a 
deterministic  (qualitative)  level. 

o Utilizes  limited  qualitative  fault  tree  modelling. 

Maintenance  Study 

14  Reusable  Rocket 
Engine 

15  SSME 
Combustion 
Chamber 

© Summary  of  SSME  and  other  liquid  rocket  motor  failure: 
© Recommends  controls  for  reducing  failures 
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TABLE  2-5 

SUMMARY  OF  PRA  SCOPE 


Time  Interval:  T-8  hours  to  MECO/ET  separation  or  intact  abort  initiation 


Risk  Sources:  • Hardware  Failures 

• Human  Errors 


Consequence  Categories:  • Loss  of  Human  Life 

• Loss  of  Vehicle 


Hardware  Included:  • Piping,  tubing,  valves,  pumps  and  other  components 

forming  the  MPS  pressure  boundary  inside  the  orbiter 
andSSME  compartments,  External  Tank  and  Orbiter 
Umbllicals. 

• Support  systems 

- pneumatic  subsystem 

- hydraulic  subsystem  (select  functions) 

- local  control  circuitry  and  ssme  controllers  (select 

• ET  separation  pyrotechnics 

• MPP5  dedicated  Instrumentation  and  sensors 

• Ground  support  equipment  associated  with  LH2  and  L02  fill  and  He 
pre-prcoouri2et1on  operations 

operations. 
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TABLE  2-6 

SUMMARY  OF  RISK  SOURCES  EXCLUDED 
FROM  HPS  PR  A 


• Solid  Rocket  Booster  (SRB)  System  and  any  SRB  Interfaces  to  the  External  Tank 

• Structural  failures  (except  for  failures  of  MPS  propellant,  pneumatic  and  lines). 

• Events  external  to  the  STS  ( natural  or  other ) 

• Latent  design  Inadequacy,  workmanship.  Installation  or  servicing  defects  introduced  prior 
toT-8  hours 

• Sabotage  and  security  violations 

• Primary  failures  outside  the  MPS  which  induce  ^eondary  failures  in  MPS 

• General  Purpose  Computer , Main  Engine  Controller , Engine  Interface  Unit  and  Cockpit  Display 
Control  Failures 

• Software  and  firmware  induced  failures 

• Electrical  power  supply  and  distribution 

• Cabling,  wiring  or  connector -related  failures. 

• Wear-out  (e.g. , end-of- We  failures) 

• Delayed  accidents  ( 1.  e. , after  orbiter/ET  separation)  resulting  from  a failure  occurlng 
during  the  time  Interval  T-  8 hours  to  ET  separation. 

• Thrust  vector  adjustment- related  failures  (e.g.  glmbellng,  throttle-up,  thruster  collision, 
yaw/pitch  actuators) 

• Cryogenic  leakage  spraying  on  adjacent  Unn panants  causing  temperature  decrease  below  safe 
operating  limits. 


Figure  2~  1 

Time-Phased  Risk  Profile 
(per  launch)  * 


ItAjam  *UJU  p»ULd»de  * 614  unp  4toppA  jo/pw  *;n  ssoijo  fwuqtqo*]  *6e-WAV 


* Based  on  event  tree  results  In  Tables  6-  la  and  6-  lb. 
**  Ground  accident.  Outside  scope  of  analysis. 
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Section  3 

METHODOLOGY 


ps«  rely  on  bctn  FT*!  anc  event  t.^ee  analysis  ' u : A > to  quantify 
is  used  to  establisn  the  logical  relationship  oetween  a system  ievei 
all  conceivable  combinations  of  component  level  failures  which  cause 
each  adverse  outcome  possibility  with  the  time  sequence  0>  sy 
sub s/s  tarns  failures  which  cause  it. 


r i sK • FTA 
allure  anp 

t . ETA 
tans  or 


The  fault-tree  model,  is  structured  from  a knowledge  of  system  operation 
ana  previous  FMEA ’ s/ ha  cards  studies  (Table  2-4).  Data  is  derived  from  various 
failure  rate  or  probability  data  bases  (Table  3-1  ).  These  are  tne  input  values 
for  the  fault-tree  model. 


The  fault-tree  model  can  then  be  used  to  generate  a "top  event"  fe.g.,  loss 
of  life  and/or  vehicle)  probability.  Because  the  input  data  is  generic  (;.e., 
not  specifically  based  on  5TS  hardware  failure  history),  an  evaluation  of  FTA 
model  and  failure  rate  date  sensitivity  Is  needed.  Sensitivity  analysis 
involves  perturbing  the  input  data  to  determine  its  effect  on  the  top  event. 
Sensitivity  analysis  can  be  performed  by  altering  the  fault-tree  model  to  test 
different  assumptions.  Sensitivity  analyses  may  involve  modifications,  such  as 
the  deletion/addition  of  fault-tree  branches,  changing  gate  logic,  and  changing 
the  success  criteria.  The  recomputed  top  events  provide  some  insight  of  the 
sensitivity  of  the  model  to  the  parametric  or  structural  changes.  In  addition  , 
one  can  vary  the  value  of  a specific  failure  rate  input.  In  this  manner,  one 
can  begin  to  bracket  the  top  event  probability  range.  Subtleties  associated 
with  sensitivity  calculations  are  discussed  in  more  detail  m Section  4. 


The  results  of  FTA  are  then  used  in  the  final  ETA  computations,  where 
consequences  are  factored  in  based  or  the  time  of  accident,  the  probability  of 
loss  of  life  (respectively,  loss  'of  vehicle).  If,  for  example,  the  accident 
occurs  when  the  STS  is  still  on  the  launch  pad,  hardware  losses  will  be  greater 
than  ones  (he  STS  has  cleared  the  launch  facility.  The  time-phased  aspect  of 
consequences  and  ETA  in  general  are  provided  in  Section  B. 


These  activities  and  results  are  presented  in  Figure  3-1. 


3.1  MODEL  DEVELOPMENT 

A fault  tree  model  is  based  on  Boolean  mathematics.  That  is,  logical 
operators  consisting  primari iy  of  and,  or,  not,  and  combination  gates  are  used 
to  represent  the  parts  from  lower  order  events  and  the  top  event.  A description 
of  Boolean  logic  symbology  and  terminology  is  provided  in  Figure  3-2.  This 
standard  convention  is  used  for  all  FT A in  this  report. 

A brief  description  of  fault  tree  organisation  and  an  example  cf  the  method 
in  which  major  MPPS  components  were  accounted  for  in  the  FTA  are  provided  below. 
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j , \ Fsj ilt  Tree  Q-gan  * at  icr. 

* ^£-1*  tree  r?:dei  ? Figure  3-3?  was  developed  frcm  :he  review  of  various 
■"e;i  or? wires  and  svs terns  evaluations.  F icmc  s' here  t ; cs  usr*  firs:  ana  I y- eg 
:o  sro.ida  5 basic  under s landing  of  the  ridin  process  flow  of  the  LH2  ana  LG3 
-ropei:*r:  : me*  . ^ejcr  components  including  pumps,  valves  and  other  active 
devices  wrich  are  required  to  deliver  the  pronellant  to  the  main  engine  burners, 
and  their  act  urttr^  s/mt-ol  Isrs  are  depicted  in  Figure  3-4.  Components 
providing  * ssconnery  function  and  systems  supporting  the  MPPS  are  not  depicted 
ir  figure  3-4,  but  include  the  hydraulic,  pneumatic  end  electrical  controls. 

The  complete  fault  tree,  along  with  descriptions  of  its  basic  events,  is 
included  ip  Ap pend; % C . 

In  general,  t he  fault  tree  model  was  structured  m three  parts:  ground, 

launch  end  MECO/E T separation  failures.  These  three  time  secuences  represent 
the  actual  tine  of  occurrence  of  the  top  event.  Failures  occurring  during  the 
pre-f light  phase  induced  human  errors;  fcr  example,  f ire/expios ion  at  the 
launch  site  cculd  be  the  result  of  human  error  during  maintenance,  repair  or 
flight  preparation  activities.  Human  error  on  trie  ground  prior  to  Launch  can 
also  manifest  itself  as  a latent  failure  during  the  actual  flight;  i.e., 
improper  execution  of  preiaunch  tasks  can  cause  or  allow  an  undetected  problem 
to  contribute  to  the  manifestation  of  a catastrophic  failure  during  the  first 
few  minutes  of  the  launch.  In  contrast  with  the  ground  operation  failures, 
flight  failures  consist  primarily  of  hardware  problems,  because  there  is  such 
limited  opportunity  for  inflight  human  error. 

To  ensure  that  all  previously  identified  single  point  failures  have  been 
included  in  the  fault  tree,  a cross  reference  check  with  FftEA’s  and  HA’s  is 
performed.  That  is,  every  criticality  1 event  in  the  FMEA/HA  is  cross  indexed 
to  a specific  basic  event  or  gate.  In  some  cases  an  FMEA  item  may  appear  in 
multiple  brancnes  of  a tree.  For  details  regarding  this  consistency  evaluation, 
refer  to  Appendix  H. 

NOTE:  Figure  3-3  is  an  edited  version  of  the  fault  tree,  provided  for 

convenience.  Branches  which  oo  not  add  significantly  to  unders t and i ng  the  model 
are  not  provided.  For  example,  all  redundant  branches  associated  with  the 
center  and  right  engines  are  not  included,  as  Left  engine  is  typical.  The  page 
numbers  are  identical  with  those  in  Figure  0-2,  the  expanded  fault  tree.  Figure 
3-2  should  be  consulted  for  transfer  gates  on  pages  not  provided. 


3.1 ,2.  Example  cf  Model  Development 

Structuring  a fault  tree  from  engineering  diagrams  is  not,  however,  a 
straight  forward  process.  Hardware  failure  modes,  failure  rale  data,  and 
tine-phased  operation  greatly  affect  the  manner  in  which  the  component  is 
treated  within  the  fault  tree.  To  illustrate  this  point,  a simplified  LH2/L02 
pump  valve  schematic  was  developed  (Figure  3-4).  The  component  and  major  piping 
lines  represent  the  main  process  flow  and  pressur icat ion  functions  associated 
with  the  LH2  and  L02  propellant  subsystems. 
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are  Many  subtleties  regarding  the  ^ricus  aerating  moces  and 
: rna L requirement 3 of  the  hardware.  However,  the  s:n*neti:  does  provide  5 
;rg  point  ana  e basic  description  of  those  components  most  strongly 
ting  system  operation. 


Cong  orient  failures  may  cpoear  on  different  i:  rare  he  s of  the  fault  tree 
depending  ;.r  ^ h>  i r impact  on  5 T3.  This  15  evident  by  examining  the  cross 
re  f ersnes  notes  3 n o w n in  F 1 0 u r e q *“  *• » Each  note  *.  1 . e . , identification  riu^dfir 
appearing  ne*.t  to  the  component  i corresponds  to  a basic  event  cr  ufiiSiC  events  in 
the  fault  tree  model.  The  failure  node  and  tine  of  failure  are  clearly  very 
important  factors  in  determining  m which  branches  of  the  tree  each  component 
belongs.  More  discussion  regarding  fault  tree  organization  15  contained  in 
Appendix  D. 


3 . 2 DATABASE  DEVELOPMENT 

3.3.1  Component  Failure  Rates 

Component  failure  rates  are  based  on  widely  used  generic  sources.  Rome  Air 
Development  Center  f RADC 5 documents  provide  most  of  the  data  used  in  this 
analysis,  A summary  of  RAOC  and  other  failure  rate  documents  is  provided  in 
Table  3-1.  Whenever  available,  however,  shuttle  specific  data  is  utilized  to 
establish  failure  rates. 

Electronic  component  failure  rates  are  primarily  based  on  M I L-H08K -3 17E 
with  the  failure  mode  allocation  determined  bv  IEEE  Standard  500.  That  +Hf» 

base  failure  rate  is  calculated  for  the  parameters  and  quality  factors  as 
outlined  in  J1IL-H0BK-21 7E . In  a case  when  only  selected  component  failure  modes 
lead  to  catastrophic  system  failures,  the  overall  failure  rate  is  adjusted 
according  to  the  failure  mode  allocation  determined  by  lEEc.  Standard  S00. 

3.3.2  Basis  for  Exposure  Times 

The  probability  values  used  for  each  of  the  basic  events  depends  on  both 
the  failure  rate  and  on  the  exposure  tine.  Assuming  that  the  component 
reliability  deceases  exponentially  in  time,  the  probability  of  failure  is 
calculated  via  the  expression  below: 


P = 1 - exp  (-1 T ) 


Equation  3-1 

Where  XT  ls  the  product  of  failure  rate  < X ) and  the  exposure 
time  ( * 

Many  components  are  required  to  function  on  a ’’per-demand"  basis.  ExampLes 
of  components  requirec  on  demand  consist  primarily  of  valve  ooeni ngs/c i 05 1 nos  . 
pyrotechnic  firings  or  human  actions  associated  with  operation.  "Failure  on 
demand"  means  that  at  a discrete  time,  certain  components  must  perform  a 
one-time  action.  “Per-damanct"  may  alternatively  be  expressed  as  “per  cycle". 
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In  the  case  of  3 :omqcnen:  required  operate  or,  demand,  total  failure 
probability  is  the  p^oduc:  of  tbs  failure  "’ate  per  Parana  and  'he  number  c f* 
demands  . 

7^-e  e*c-re;  e ten.  tr.  Ecus: ion  3-?  ray  ha  used  to  es  :;mate  per  demand  failure* 
dr  sett.rg  t - r e-posjre  tine  for  that  basic  event  equal  to  the  entire  time  prior 
t :■  the  c::urar,.:e  of  the  cn- demand  requirements.  For  e.^a^pie,  the  pravaives  end 
the  engine  hydraulic  valve  5 isolate  engines  in  red  line  condition  on  demand.  The 

to  phut  do wn  a particular  engine  nay  occur  at  any  tine  prior  to  MECG. 
Therefore,  the  cumulative  exposure  time  is  the  tine  between  start  of  igr.:t  ion 
sequence  to  PECO  or  appro  * mat  e i / 5.1  minutes.  Mote  that  this  estimate  mey  fail 
t:  be  conservative  if  the  “on-demand"  requirements  of  the  cats  base  components 
a^e  less  than  those  of  the  MFPS  components. 

The  t ime-phased  nature  cf  the  fault  tree  requires  that  for  some  component 
failures  <e.g.  those  which  could  occur  at  any  time  to  contribute  to  the  top 
event)  separate  exposure  times  must  be  established  for  each  time  segment. 
Break-up  of  other  tins  phases  not  shown  in  the  fault  tree  is  required  far  the 
purpose  cf  consequence  analysis.  Details  regarding  the  partitioning  of  time 
intervals  are  provided  in  Section  7.  A tabulation  of  the  time-phased 
probabi i i t les  is  contained  in  Appendix  C. 


3.3  PROBABILISTIC  COMPUTATIONS 
3.3.1  CAFTA  Code 

All  fault  tree  computations  are  performed  using  the  CAFTA  code.  CAFTA  13  a 
microcomputer-based  program  which  performs  fault  tree  analysis  on  a system  or 
grouo  cf  systems.  The  program  includes  a fault  tree  editor  for  building  and 
updating  fault  tree  models,  and  a reliability  data  base  for  storing  all  basic 
events  used  in  the  models.  A brief  description  of  the  code  capabilities  and 
limitations  is  provided  in  Appendix  I. 

CAFTA  relies  c n FTAP  algorithms  to  generate  the  minimal  cutsets.  The 
complexity  of  the  fault-tree  model  prohibits  the  generation  of  all  possible 
minimal  cut  sets.  Therefore,  a truncation  limit  (10-8)  15  defined  to  eliminate 
cons  1 der a t ion  of  very  low  probability  sequences. 

CAFTA  truncates  low  probability  sequences  in  a bottom-top  manner  at  each 
level  of  fault  tree  intermediate  events.  Intermediate  events  above  the  cut-off 
threshold  remain  for  inclusion  in  cutset  reduction.  The  sum  of  minimal  cutset 
probabilities  conservatively  approximates  the  top  event  probability. 

It  was  judged  that  cutsets  with  probab i 1 i t ies  below  10-8  are  negligible 
contributors  to  top  event  occurrence  and  can  be  eliminated  from  consideration, 
as  the  probability  of  the  top  evert  is  on  the  order  of  2.5  x 10-3. 


3.3.2  Importance  Measures 


Importance,  in  the 
uq 3 i ■.  € v sri t ( v*  minimal 
Consider  the  importance 
event.  Basic  event  Mi" 


probabilistic  sense,  refers  to  the  significance  which  a 
cutset)  has  towards  the  outcome  of  a top  event, 
of  a lowest  level  event  Ci.e.,  basic  event)  to  the  top 
(EEi)  will  appear  in  at  least  one  sequence  or  cut  set 
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which  leads  to  the  top  event  * Most  Likely,  however,  a basic  event  will  ecoear 
in  many  out  vets  a 5 defined  cy  the  tree  branches.  The  Boolean  5 urn  of  the 
jrot'jc  i !.  1 1 y •:  f a L : *j«:h  sequence?  determines  the  probaciiity  of  the  top  event. 

t : - inocrtancs  measures  ere  calculated  by  CrtFTrt’s  cut  set  editor; 

^re  F- = =e  L 1 -‘/ese i > , Girnbaun  ,.  Risk  ncniave^2«t  Worth,  fiis*  Peouc * ion 
'wontr  , 3 " : ’ '.  : ? i it  y end  Structural  Importance.  Eacn  or  these  different 
import  *.c  :e  r,e;si.res  pravices  different  information.  For  this  FF.ft  , only  the 
F u « = e 1 1 - e e e 1 / aro  Structural  Measures  of  Importance  are  used,  sc*  only  these  are 
de  f l rec . 


F*j  * se  1 i -V  ss  a 1 * [me  or  t a nee 

The  collection  o f cut  sets  Kj  where  EEi  is  contained  1 r Kj  15  used  to 
compute  the  importance  of  SEi.  The  ratio  of  the  probability  of  all  sequences  in 
which  a given  basic  event  occurs  to  the  total  top  event  probability  determines 
the  importance  of  the  basic  event. 


The  Fusse  1 i-k'esely  measure  of  basic  event  importance  is  effectively  a 
weighting  function  with  numerical  value  between  0 and  1.  The  Fussei 1 -Vese 1 y 
measure,  I (3Ei>,  is  defined  by  the  following  ratio: 

Ft; 

Probability  of  The  Boolean  union  of  minimal  cut  sets  EEi 

r (B£i)  * 

F*J  Top  Event  Probability. 

Equation  3-2* 

If  a basic  event  is  contained  in  each  minimal  cut  sets  (also  called  min  cut 
set)  then  its  importance  value  is  unity.  Stated  in  other  terms,  the 
Fussell-Uesely  is  the  conditional  probability  that  a run  cut  set  containing  the 
basic  event  occurs  given  the  occurrence  of  the  Top  Event.  The  Fussel 1 -Vesely 
importance  is  therefore  computed  as:- 

Sum  of  min  cut  set  probab 1 1 1 t 1 es  containing  BEL 

I (BEi)  ‘ - 

FV  Sum  of  all  nun  cut  set  pr obab i 1 i t 1 es . 


Equation  3-2b 

Equation  3-2a  represents  a universally  applicable  Fusse 11 -Uese i y 
relationship  between  a specific  basic  event  and  the  top  event.  This 
relationship  is  useful  in  expressing  how  much  attention  should  be  given  to  each 
basic  ever t within  the  fault  tree.  Note  that  the  Importance  of  BEi  as  defined 
by  Equation  5-2a  will  vary  with  the  choice  of  different  tap  events.  ft  weakness 
of  the  Fussel l-Ueseiy  measure  of  importance  is  that  it  depends  on  Failure  data 
which  may  be  uncertain  (as  in  this  study). 
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Structural  I-D.?r  + ?.rcf 

A «e:crd  measure  zr  importance  qwoUtatr/*  ranting.  This  measure 

often  cslisb  tr-e  structural  importance  because  ■;  :^cu t a 1 1 on  cf  tne  importance 
■•  = !•-.*  i?  cependent  upon  ;he  structure  :^*  tne  -suit  "^ae  rather  than  ^ocr, 

3-‘  rest : 1 *.  t ic  3 assigned  t : has:!  events,  'This  .neas^rs  iz  therefore  useful  i.n 
est  ab  i : sr  irg  5 ?ir*t  c.-der  ranking  or  significant  basic  event*,  .jhan  *hers 
either  sramty  of  *aiiure  data  or  much  uncertainty. 

Structural  Importance  15  defined  as  the  fractional  numcer  of  system  states 
that  ore  critical  for  a component.  n critical  oyster  state  for  a con pc rent  is 
defined  as  a system  state  such  that  the  system  makes  a transition  from  the 
unfilled  to  failed  state  whe n a component  fails,  or  mere  generally  the  top  event 
occur 5 when  a basic  event  occurs.  This  can  be  computationally  approximated  b v 
the  expression  below: 


I = Sum  of  the  probab i 1 i t i es  of  ail  cut  sets 
$ given  P'EEi)  * 1,0  and  P<3EO  = B.5 

- Sum  of  the  probabilities  of  all  cut  set* 
given  ?<BEi)  « 0.0  and  P(BEi)  = 0, S 

Equal  ton  3-3 

Where  P(SEi)  * 0.5  signifies  that  all  basic  events  except  for  BEi  are 
assigned  a probability  of  0,5. 

Structural  importance  measures  ere  most  useful  ir.  fault  tree  structures  ir. 
which  minimal  cutsets  are  comprised  of  doubletons  Ci.e.  iuo  basic  events)  or 
higher  order  cutsets.  Minimal  cutsets  in  which  only  singletons  exist 
necessarily  have  a computed  value  of  zero  by  structural  measure.  Furthermore  , 
minimal  cutsets  consisting  primarily  of  singletons  combined  with  other  events 
which  appear  in  only  or  a few  cutsets,  have  computed  values  at  or  near  zero.  It 
is,  therefore  , very  important  to  understand  the  cutset  conditions  and 
composition  which  exist  prior  to  computing  the  structural  importance  value. 

It  is  important  to  realise,  however,  that  structural  importance  is  not 
computed  entirely  independent  of  probabi I i t les  assigned  basic  events.  CAFTA 
uses  a truncation  value  to  eliminate  cutsets  with  probability  below  a specified 
limit.  The  structural  importance  measure  is  calculated  only  for  those  basic 
events  which  appear  m cutsets  above  the  truncation  value.  Structural 
importance  can  provide  a useful  tool  For  ranking.  The  reader  is  cautioned, 
however  to  exercise  careful  judgement  when  analyzing  importance  of  basic  events 
which  appear  in  cutsets  at  or  near  truncation  value. 

Each  relationship  provides  valuable  information  regarding  the  significance 
of  hardware  and  human  actions  in  precluding  a major  accident.  Pr i or i t i zat icn 
based  cn  the  ranking  scheme  is  a useful  tool  for  possible  upgrades  , 
modifications  and  procedural  changes. 

A summary  of  the  highest  ranking  basic  events  and  their  respective 
importance  values  is  provided  in  Section  P *nH  in  App^nH^v  T 
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More  discussion  on  the  application  of  this  measure  is  provided  in  Section 
4. 

3.3.3  Synthetic  Sampling  Statistics 

The  unavailability  of  STS-specific  failure  rate  data  can  be  addressed 
mathematically  through  a synthetic  sampling  method  discussed  below. 

The  variable  of  interest  in  this  study  is  the  too  event  <e.g.  loss  of  life 
and  or  vehicle).  The  top  event  < TE  ) is  a function  cf  basic  events  EEi , 
6E2...6Ek.  The  function  is  complicated  and  represents  the  sum  total  of  ail  cut 
sets  above  the  specified  cut  set  truncation  limit.  The  Question  to  be 
investigated  is-  How  does  TE  vary  when  the  BE’s  vary  according  to  their 
individual  probability  distributions?  delated  questions  are:  What  is  the 
expected  value  of  TE7  Uhet  is  the  90th  percentile  cf  TE?  etc. 

By  sampling  repeatedly  from  the  individual  probability  distributions  of 
the  BE’s  and  evaluating  TE  for  each  sample,  a probability  distribution  for  TE  is 
produced.  This  PRA  will  utilize  the  Top  Event  Matrix  Analysis  Code  ( TEMAC ) for 
the  statistical  computations  needed  to  generate  the  TE  distribution. 
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Summary  of  Failure  Rate  Data  Sources 


DOCUMENT  (Ref  *) 

DATA  CONTAINED  | 

1 

Nonelectronic  (Mechanical) 
Parts  Failure  Rates. 
LMSC/DS20737.  Revision  C. 
Nov.  26,  1966.  prepared  by 
J.  T.  Yee  (1) 

o Compiled  from  various  sources:  RADC.  Hughes  Aircraft  Co.,  TRW.  j 

LMSC  internal  documents. 
o Strictly  mechanical  components. 

O Provides  failure  rate  (per  hour)pomt  estimates:  includes 

environment/ application,  but  no  breakdown  of  failure  modes. 

Nonelectronic  Parts 
Reliability  Data.  NPRD-3, 
RADC.  Fall  1 985,  prepared  by 
Michael  J.  Ross.  (2) 

o Data  from  Rome  Air  Development  Center  for  non-electromc  parts, 
o Provides  failure  rate  (per  hour)  point  estimates,  also  608  upper 
single  sided,  208  lower  and  808  upper  intervals  from  chi  squared 
distribution. 

0 Includes  breakdown  by  environment;  failure  mode  distribution  given 
separOuhy. 

RADC  Nonelectronic 
Reliability  Notebook, 
RADC-TR-Q5-I94,  Interim 
Report.  Oct.  1985.  Hughes 
Aircraft  Company.  (3) 

o Provides  failure  rate  (per  hour!  estimates  with  808  upper  and 
lower  bounds  from  exponential  and  Weibull  distributions . 

O Includes  breakdown  by  environment,  not  by  failure  mode. 

Nonelectronic  Reliability 
Notebook.  RADC-TR-75-22, 
AD/AOOS-6S7,  RADC,  Jan. 
1985.  (4) 

0 Same  as  above,  except  uses  908  confidence  limit. 

IEEE  Guide  to  the  Collection  and 
Presentation  of  Electrical. 
Electronic,  and  Sensing 
Component  Reliability  Data  for 
Nuclear-Power  Generating 
Stations,  IEEE  STD  500-1977, 
June  30,1977.  (5) 

o Provides  failure  rates  per  hour  and/or  per  cycle;  gives 
high,  low,  maximum,  and  recommended  values  for  908 
confidence  interval  from  chi  squared  distribution;  gives 
breakdown  by  failure  mode. 

Reliability  Prediction  of 
Electronic  Equipment, 
MIL-HDBK-217E,  15  Jan 
1986.  (19) 

; o Provides  equations  and  parameters  for 
calculation  of  failure  rates  based  on 
environment,  quality,  packaging,  etc. 
o Data  from  RADC. 

Handbook  of  Piece  Part  Failure 
Rates,  Martin  Marietta  Corp., 
Denver  Division,  GIDEP 
031-1273.  (27) 

o Point  estimate  failure  rates  for  mechanical  piece  parts. 

Handbook  of  Human  Reliability 
Analysis  with  Emphasis  on 
Nuclear  Plant  Applications, 
NUREG/CR- 1 278,  SANO 
80-0200.  August  1983. 

o Human  reliability  data. 
o Human  error  probability  shaping  factors. 
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TABLE  3-2 

SUMMARY  OF  MINIMUM  SUCCESS 
PARAMETERS  AND  CRITERIA 


component  or  System 

Success  Parameter /Criteria 

SSME  System 

Two  of  three  engines  are  available  and  fully  functional  for 
the  first  5.8  minutes  of  the  flight.  Thereafter  at  least  one 
engine  Is  avatlabe  to  press  to  htECO  and  abort.  No  credit  Is 
given  for  the  existence  of  a shutdown  inhibit  preventing 
shutdown  of  a second  engine  under  many  redline  conditions. 
This  conservative  modeUng  assumption  Is  used  because  we 
cannot  assess  the  performance  of  a redline  engine. 

SSME 

All  major  com  pc  rente  { c.g.  LP  FT,  LPCT,  HP  FT,  HPOT, 
OPOV,  FPOY.  MFV,  MQY)  are  fully  functional  in  order  for 
an  engine  to  be  considered  available. 

ET  Separation 

Complete  separation  of  the  ET  from  the  orbiter  on  demand 
considered  a success.  Partial , premature,  or  delayed 
separation  are  all  considered  to  be  failures. 

Tank  Pressurization 

Tank  Integrity  is  maintained  during  flight  conditions  by 
maintaining  proper  ullage  pressure,  f ai  lure  to  maintain 
prescribed  pressure  results  in  structural  damage  to  tanks 
and/or  main  engine  pump  cavitation 

Pressure  Boundary  Integrity 

Pressure  boundary  failure  of  a high  pressure  system  is 
considered  an  immediate  catastrophic  failure.  An  ignition 
source  must  be  present  to  cause  such  a fai  lure  in  a low 
pressure  system.  Any  break  of  propellant  system , 
hydraulic  system  or  pneumatic  system  piping  in 
conjunction  witn  an  ignition  constitutes  a loss  of  venicle. 
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Fault  Tree  Symbology  and  Terminology 


SYMBOL 


output 


NAME 


DESCRIPTION 


AND"  gate 


All  Inputs  must  occur  in  order  for  the  output 
to  occur. 


inputs 


output 


"OR"  gate 


Any  Input  must  occur  in  order  for  the  output 
to  occur. 


Inputs 


output 

f — I 

input 


"NOT’ gate 


Negated  Input  causes  the  output  to  occur.  In 
probabilistic  terms,  the  output  is  the 
complement  of  the  Input  or  1 -P(  input). 


output 


inputs 


“Combination"  gate 


At  least  "n"  of  total  inputs  must  occur  in  order 
for  the  output  to  occur. 


{ menoNic  i 


* 

"Transfer"  In  gate 


p«g«  YY 
page  71 


A transferred  branch  of  the  tree  appearing  in 
“x”  different  locations  as  identified  by  page 
number(  $)  shown;  if  no  page  number  is 
shown , that  denotes  a suppressed  portion  of  the 
tree,  which  is  presented  In  Appendix  D. 


appears  denes tn  mnemonic  descriptor. 
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page 


SYMBOL 

NAME 

DESCRIPTION 

o 

Basic  event  * 

Lowest  element  in  the  fault  tree.  The  basic  event 
represents  the  limit  of  resolution  of  the  fault  tree. 

O 

Undeveloped  event* 

Self  explanatory;  used  to  represent  events 
considered  outside  the  scope  of  analysis,  f urther 
definition  or  quantification  may  De  required  at  a 
future  date. 

House  Event 

Used  as  a toggle  device  ( e.g. , value  of  event  is  set  to 
”0"  or  “ I •')  to  isolate  branches  of  the  fault  tree  as 
necessary.  This  is  primarily  used  for  time  phase 
aspects  of  fault  tree  development. 

j 1 2 3 4 367S  J 

Mnemonic 

Descriptor 

Encoded  information  regarding  event  type,  name, 
and  failure  mode.  Se8  Appendix  B for  details 
regarding  Basic  Event  Mnemonics. 

GATE 

DESCRIPTION 

Event  or  gate 
descriptor 

A brief  description  of  logical  outcome  of  any  event 
or  gate. 

SATE 

DESCRIPTION 

A 

page  YY 
page  22 

“Transfer”  out  gate 

A transfer  of  a gate  to  other  branches  of  the  tree. 
"X“  denotes  the  number  of  locations  to  which  the 
gate  was  transferred. 

Appears  beneath  mnemonic  descriptor. 
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Fault  Tree  Symbology  and  Terminology 


page  3 


TERM 

DESCRIPTION 

Basic  Event: 

The  lowest  order  event  developed  in  the  fault  tree  logic  model.  Inmost 
cases  this  corresponds  to  a component  failure,  human  error , or  an 
environmental  condition.  Basic  events  are  the  inputs  to  the  logic  model. 

1 

j 

Intermediate 
Event  or  Gate: 

i 

A logical  outcome  resulting  from  a single  or  combination  of  basic  event 
or  lower  order  event  occurring  at  any  level  in  the  fault  tree. 

Top  Event: 

The  logical  outcome  of  a fault  tree  model.  The  top  event  in  this  analysis 
is  the  catastrophic  loss  of  human  life  and/or  STS  vehicle  and  facilities. 

Cutset: 

A combination  of  basic  events  which  leads  to  the  top  event. 

Minimal  Cut  Set: 

A cut  set  with  no  proper  subset  which  is  Itself  a cut  set. 

Event  Sequence: 

The  success  and  failure  paths  defined  by  critical  time  intervals,  ET  separation 
and  mission  abort  landings. 

Consequence: 

The  outcome  of  an  event  tree  sequence  measured  in  terms  of  success,  or  loss 
probabilities.  Consequences  will  be  measured  both  in  terms  of  loss  of  life 
and/or  vehicle. 

LMSC  F2230402 


DRAWING  NUMBER  DATF. 

Page  2 9/04/87 


LMSCF2230402 


LMSCF2 230402 


s 

s 

1 

6 

v> 

2 


LMSCF2230402 


LMSC-F2230402 


i s 2 § 3 < 
>355  8 


GO 

n"  LU 
t LU 

2 CE 


<D  3 
^ < 


U-  M 


LMSC-F22304<tt 


LMSCF2230402 


03  3 

3 < 
CD  ^ 


tr 

UU 

S3 


0) 

»! 

cu 

O. 


to 


b 


b 


§0 


HYCTL  outputs 

Pago  3,  Pa<;e  140.  Pugt»  133 


PKEUMCONTL  OUll’MTS 


MIS5J4/& 

PRECEDING  PAGEHBfefcNK  NOT  FILMED 


LMSC  F2230402 


DRAWING  NUMBER  OATE 

Page  59  9/04/87 


LMSCF2230402 


DRAWING  NUMUfcR  I DATF 


LMSC  f 2230402 


LMSC-F2230402 


DRAWING  NUMBER  DATE 

Pams  (VRyl/07 


LMSC-F2230402 


DRAWING  NUMBER  DATE 

Page  67  9/04/87 


LMSC  F2230402 


LMSC  F2230102 


LMSCF2230402 


PRECEDING  PA&E  BLANK  NOT 


M 

r^ 

<0 

CO 

Q_  * * * 

Q_ 

o 

5 LU 
2 IT 

HATE 

9/ 

CO  ^ 

cot 

o D 

3 < 
O)  LL 

c 

a CO 

il 

i^ 

O 2L 

Z O) 

UJ 

5 03 

-J 

2 CL  : 

o 

LMSCF2230402 


OHAVWIMfi  NUMB  EH  DATE 

Page  74  9/04/87 


LMSC-F2230402 


LMSC  F2230402 


DRAWING  NUMUfcH  DATE 

Page  82  9/04/87 


LMSC  F2230402 


LMSC-F2230402 


UflAWING  NUMBER  OATE 

Paae  84  Q/na/n? 


LMSCF2230402 


r^ 

CO 

co 

CL  ... 

Q_  LU 

o 

5 UJ 

^ cc 

5 25 

• • 1— 

<n 

03  D 

1 

3 < 
CD  u- 

|p 

'&  m 

Ll 

5 oo 
s 

fi  QJ 

z a 

UJ 

3 CO 

£ 

S CL 

c 

LMSC-F2230402 


CO 

a!  uj 

““  1 I I 


UJ  tu 


H I1 


<D  D 
w <f 

= lr 


a CD 
| CO 


0 <8 

1 O 

I ra 

5 CL 
a 


LMSC-F2230402 


00 


o 

OS 


r^ 

oo 

o 

ra 

03 

a. 


LMSC  F223D402 


r- 

GO 

co 

Q_  ... 

Q_  H 

o 

LU 
^ QC 

|05 

. . )— 

CO 

CO  t3 

QJ  => 

s 2 

o> Li- 

« 

| CO 

i-L 

| CO 

r*  © 

5 cn 

5 © 

I 

LMSC-F2230402 


LMSCF2230402 


HUE 


FILMED 


Lii  g 

pc  i 


3 

< « 
. u_  5 

’ § 
3 


Page  93  | 9/04/87 


LMSC-F22304Q2 


n* 

CO 

CO 

O.  ... 

Q_  J-U 

0 

<5  LU 

QT 

lj 

1 0 

• * 

CO 

CO  ^ 

0 ZD 

5 < 

05  U- 

cr 

s ^ 

iZ 

1 

O O 

1 O 

5 CO 

1 *■ 

LMSC-F2230402 


preceding 


PK3E  % FILMED 


DRAWING  NUMBER  DA  It 

Page  1 1 0 9/04/87 


LMSC-F2230402 


Afissivfe- 

PRECEDiNG  PAGEHBfcAMK  NOT  FILMED 


LMSC-F2230402 


• ^ M \ZSIM* 

PRECEDING  PAGE  £LA?iX  NOT  FILMED 


CO  « 


DRAWING  NUMBER  DATE 


LMSC-F2230402 


LM5C  F2230402 


fiPCOXXIC  OUI PUIS: 

P*«je  110.  Page  130.  Pngo  i32 
Page  150.  Page  Ml.  Page  M3 
P.ifjO  145 


LMSCF2230402 


CL 

CL 

2 


CO 


LU 

UJ 

cc 

F— 


CO 


O 3 

3 < 

Ll 


CO 


M- 

o 


05 


CM 


a? 

ot 

ra 

CL 


LMSCF2230402 


DRAWING  NUMBER  [DATE 

Page  122  I 9/04/87 


LMSCF2230402 


FILMED 


CO 

Q_ 

0. 


CO 


LLi 

UJ 

CC 

h* 


CO 


CD 

»— 

•J 

OJ 

Li. 


D 

< 

LL 


Is- 

CO 


rr 

o 


cr> 


co 

CO 


0) 

CJ 

CC 

CL 


I MSC  F2230402 


CO 

a. 

Q.  LU 

2 uj 

CC 

. . 

CO  . 

• r* 
CO  «J 

83 

&“• 


UJ 

-J 

fc: 


00 

go 

LO 

o 


Is 

00 

2 

a ® 

5 ov 

5 03 
CL 


LMSC-F2230402 


LMSC-F2230402 


tu-  • 


DIUWING  NUMOER  DATE 

Page  138  9/04/87 


LMSC-F2230402 


r- 

C/D 

oc 

a.  ... 

Q.  w 

c 

LU 

s cc 

5 CD 

Q 

• ■ h” 

CO 

"b 

0}  ZD 

3 |5 

« 05 

O)  LL 

9 00 

i-L 

2 «— 

ZD  1 
2 

o 2i 

s cr 

UJ 

5 03 

g 

5 CL 

o 

LMSC  F2230402 


DRAWING  NUMUER  DATE 

Pago  140  9/04/87 


LMSC  F2230402 


r*- 

CO 

o 

Q-  ... 

Q- 

o 

3 UJ 
^ OC 

s 3s 

■ Q 

* * H 

CO 

« tj 

© 3 

^ < 

X T— 

UJ  _-t 

2 3 

iZ 

3 1 

■“1* 

O O 

Z CO 

s c: 

2 CL 

o 

LMSCF2230402 


_ /^lSSj^Sr- 

PAQE-«fc*f«f^OT  FILMED 


LMSCF2230402 


LMSCF2230402 


M \^IA/<~r 

PRECEDING  PAGE  -BtAWt  NOT  FILMED 


co 

a. 

CL 


CO 

I 

CO 

S> 

=5 

g> 

Li_ 


UJ 

UJ 

HZ 

K 


D 

< 

LL 


DRAWING  NUMBED  DATE 

Panft  150  Q/oa/«7 


LMSC F22i0402 


0- 

Q_  LU 
5 LU  f 

cc  s 
. . H-  ~ 

Si 
s < = 
§>“-  i 


IMSC  F 2230402 


C/D 

CL 

CL 


LU 

LLI 

CC 

h- 


Ss 

si 

3 LL 
05 

Ll 


CO 

CO 

in 

o 


<\J 

in 


<3> 

g>\ 

cs 

CL 


I.MSC  f 2230-102 


C/3 

CL 

CL  UJ 

2 w 
tr 


3.°-  1 S 


0 a 

2 O 
$ cu 

1 


LMSC-F2230402 


IMSC  F223G402 


m \ssiy& 

preceding  page-blank  not  filmed 


cc 

CO 

CO 

Q. 

LH 

Q-  LU 

O 

£T 

'-U 

£ ^ 

1— 

*?  H 

CO  _1 

D 

£ < 
3 Ll_ 
CD 

1 s 

Ll 

i 

0) 

1 05 

5 CC 

2 CL 

c 

IM5C  F 2230402 


LttSC  f 2230-102 


CO 

CL 

CL  LL1 

2 w 

QC 

!— 


CO 

I 

cn 

03 


03 

Ll 


Z 

< 

LL 


co 

LO 

o 


CO 

CO 


a; 

03 

03 

CL 


LliSC  F2230402 


tr> 

0. 

Q_  LL1 

cc 


DRAWING  NDMtttn  |)A IP 

Page  165  1/05/88 


LMSC  f 2230402 


cn 

Q. 

CL  HI 


2 w 
cc 


CO 

I 

CO 


h- 


=3  Li- 
en 


Li- 


ce 

a 

i? 

c 


nivo  I i:3amiN  DNIMVHO 


LMSC-F2230402 


NOTE: 

FIGURE  3-3  IS  AN  EDITED  VERSION 
OF  THE  FAULT  TREE  CONTAINING  ALL 
ESSENTIAL  BRANCHES. 

IF  FURTHER  DETAIL  IS  REQUIRED, 
FIGURE  D-2  (EXPANDED  FAULT  TREE) 
SHOULD  BE  CONSULTED. 
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Valve  Fault  Tree 

Schematic  Mnemonic 

Designation 


Tumble  PYRTTXPD 
Valve 


Vent  PRV00X0P 

Relief  PRYHFXOP 

PRYHFXDO 

PRYOOXDO 


OB  PNVTOFDC 

DISC  (PD1 ) 

IB 

DISC  (PD1 ) 


OB  PNYTFFDC 

DISC(PD2) 

IB 

DISC  (PD2) 


Description  of  Basic 
Event  or  Gate 
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Pyrotechnics  on  tumble  valve  fail  to  open  the  tumble  valve  at 
the  time  of  ET  separation.  This  basic  event  includes  only 
pyrotechnic  assembly  failures  and  not  the  actuation  circuitry 
failures. 


Vent  valve  on  external  tank  opens  and  sticks  open.  This 
mechanical  malfunction  depressurizes  the  tanks. 

Vent  valves  on  external  tanks  fail  to  open  when  ullage 
pressure  is  too  high,  resulting  in  overpressurization  of  the 
tank  and  hydrodynamic  instabilities  in  the  propellant  lines. 


LQ2  flapper  valve  failure  to  close  on  demand  ( i.e.  mechanical 
failure  of  valve  actuators)  causes  a possible  collision 
between  external  tank  and  orbiter.  Pneumatic  supply  or 
control  system  failures  which  prevent  valve  from  closing  are 
not  part  of  this  basic  event. 


IH2  disconnect  valve  failure  to  close  on  demand  ( I.e.  rnech. 
failure  of  valve  actuators)  causes  a possible  collision 
between  external  tank  and  orbiter.  Pneumatic  supply  or 
control  system  failures  which  prevent  valve  from  closing  are 
not  part  of  this  basic  event 


OB  DISC  PNV3FDCS  LH2  or  L02  disconnect  valve  falls  in  closed  position  due  to 

IB  DISC  PNV30DCS  mechanical  causes,  resulting  in  blockage  of  the  corresponding 

propellant  flow  path. 


FLOW 

CTRL 

LY54, 

LY52 


PNYCOICS  Tank  ullage  pressure  (flow)  control  valves  fail  in  the  closed 

PNYIOICS  or  partially  closed  position,  restricting  the  pressurization 

PNYROICS  flow  from  the  engines  to  the  external  propellant  tanks.  This 

PNVCFiCS  mechanical  failure  Is  associated  with  the  flow  control  valves 

PNYLFICS  or  their  actuators.  These  basic  events  do  not  include  spurious 
PNYRF1CS  control  signals  which  cause  the  valves  to  close. 


PNYCOICD  Tank  ullage  pressure  ( flow)  control  valves  fail  to  close  when 
PNYtOICD  required  (e.g.  due  to  overpressure  conditions).  This 
PNVROICD  pressure  regulation  failure  occurs  due  to  mechanical  causes 
associated  with  the  flow  control  valves  or  their  actuators. 
These  basic  events  do  not  include  spurious  control  signals 
which  cause  the  valves  to  close. 
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Valve 

Schematic 

Designation 


RIY 


PY2 


PY5 


LPOT 

LPFT 


Fault  Tree  Description  of  Basic 

Mnemonic  Event  or  Gate 
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PNYY8 1 CS  Pogo  accumulator  recirculation  valves  fail  in  the  closed 
PNVVB2CS  position  so  as  to  prevent  L02  from  recirculating.  These  basic 
events  represent  mechanical  malfunctions  associated  with  the 
valve  and  valve  actuator.  Spurious  control  signals  which 
force  the  valve  to  close  are  treated  separately.  Note:  fat  lure 
to  establish  recirculation  flow  is  assumed  to  cause  pogo 
accum  lator  flooding  and  subsequent  loss  of  pogo  suppression 
subsystem. 


PN2C0Z0P  Oxidizer  prevalve  fai Is  in  open  position , preventing 

PNVCOZOP  shutdown.  PN2  denotes  events  occurring  in  flight;  PNV 
PN2L0Z0P  denotes  events  occurring  prior  to  $R8  ignition.  The  cause  of 
PNVLOZOP  such  failures  is  mechanical , internal  to  either  the  valve  or 
P N2R0Z0P  valve  actuator . 

PNYROZOP 

PNVC02CS  0xidi2er  prevalve  fails  in  closed  position  due  to  mechanical 
PNYLQZCS  failures,  resulting  In  cavitation  of  the  high  pressure  fuel 
PNYROZCS  turbopump,  turbine  blade  failure,  and  internal  m issile 

generation;  this  event  also  contributes  to  failure  of  the  LH2 
system  in  that  engine  Pneumatic  supply  or  control  system 
failures  which  cause  spurious  valve  closure  are  not  part  of 
these  basic  events. 


PNVCFZOP  Fuel  prevalve  falls  in  open  position,  preventing  shutdown. 
PNYLFZOP  The  cause  of  such  failures  is  mechanical , internal  to  either  the 
PNVRFZOP  valve  or  valve  actuator. 

PNVCFZCS  Fuel  prevalve  falls  In  closed  position  due  to  mechanical 

PNYLFZCS  fai  lures,  resulting  in  cavitation  of  the  high  pressure  fuel 

PNYRFZCS  turbopump,  turbine  blade  failure,  and  internal  missile 

generation ; this  event  also  contributes  to  fai  lure  of  Ihe  LH2 
system  m that  engine.  Pneumatic  supply  or  control  system 
failures  which  cause  spurious 
valve  closure  are  not  part  of  these  basic  events. 


TDPCFLSZ  Low  pressure  turbopumps  fail  In  an  overspeed  or  underspeed 
TDPCOLSZ  condition  due  to  random  mechanical  failures  internal  to  the 
TDPLFLSZ  pump.  These  failures  are  non-catastrophic  and,  if  properly 
TDPLQLSZ  detected  and  corrected  by  shutdown , need  not  necessarily  lead 
TDPRFLSZ  to  loss  of  life  or  vehicle. 

TDPROLSZ 


NOTES  TO  FIGURE  3**4 


Valve 

Schematic 

Designation 


HPOT 

HPFT 


HE 


HPOT 

HPFT 


OPOY 

FPOY 


MFY 


MOY 


Fault  Tree  Description  of  Basic  LMSC-F223040: 

Mnemonic  Event  or  Date 


TDPCFHS2 

TDPCOHSZ 

TDPIFHSZ 

TDPIOHS2 

TDPRFHSZ 

TDPROHSZ 


High  pressure  turbo  pumps  fail  in  an  overspeed  or 
underspeed  condition  due  to  random  mechanical  failures, 
internal  to  the  pump  These  failures  are  non-catastrophic 
and,  if  properly  detected  and  corrected  by  shutdown , need  not 
necessarily  lead  tc  lees  of  life  or  vehicle. 


PNEUMCONTL 

PNEUMCONTC 

PNEUMCONTR 


Pneumatic  system  is  not  available  to  each  of  the  engines.  AM 
tanks,  piping,  regulators,  control  valves  and  associated 
hardware  are  contained  within  this  gate. 


PRBCFSLK 

PR8COSLK 

PRBLFSLK 

PRBIOSLK 

PRBRFSLK 

PRBRGSLK 


Preburners  on  high  pressure  turbopumps  leak  or  release 
high  pressure  combustion  produsets  into  the  main  engine 
compartment  This  failure  Is  caused  by  leakage  through 
mechanical  seals  and  joints  between  the  preburner  and  pump 
assemblies. 


HY2CFWCD 

HYYCFWCD 

HY2LFWCD 

HYYLFWCD 

HY2RFWCD 

HYVRFWCD 

HY2C0WCD 

HYVCOWCD 

HY2L0WCD 

HYYLOWCD 

HY2R0WCD 

HYYROWCD 


Preburner  valves  ( F PGV,  OPOV)  fail  to  close  on  demand  due 
to  structural  failure,  thereby  preventing  shutdown,  HYY 
denotes  event  occurs  prior  to  SRB  ignition.  HY2  denotes 
event  occurs  in  flight  Hydraulic  supply,  pneumatic  supply, 
and  control  system  failures  are  not  considered  part  of  this 
basic  event. 


HY2CFXD 

HYYCFXD 

HY2LFXD 

HYYLFXD 

HY2RFJCD 

HYYRFXD 


Main  Fuel  Valve  (MFY)  fails  to  close  on  demand  due  to 
mechanical  failures,  thereby  preventing  shutdown.  HYY 
denotes  event  occurs  prior  to  SRB  Ignition.  HY2  denotes 
event  occurs  in  flight  Hydraulic  supply,  pneumatic  supply, 
and  control  system  failures  are  not  considered  part  of  this 
basic  event. 


HY2C0XD 

HYYCOXD 


Main  Oxidizer  Yalve  (MOY)  fails  to  close  or  demand  due  to 
structural  failures,  thereby  preventing  shutdown.  HYY 


Valve 

Fault  Tres 

Schematic 

Designation 

Mnemonic 

HY2L0JCD 

HYVLOJCD 

HY2R0JCD 

HYVROJCD 

MOY 

HYYCFJCS 

MFY 

HYYCFWCS 

OPOY 

HYVCOJCS 

FPOY 

HYYCOWCS 

HYYLFJCS 

HYYLFWCS 

HYYIOJCS 

HYYIOWCS 

HYYRFJCS 

HYYRFWCS 

HYYROJCS 

HYYROWCS 

HPOT 

HEXCOPRP 

HEXLOPRP 

HEXROPRP 

MCC 

CHBURN 

NOTES  TO  FI  SURE  3~4 

Description  of  Basic 
Event  or  Gate 
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denotes  event  occurs  prior  to  SRB  ignition.  HV2  denotes 
event  occurs  in  HYVIGJCD  f light  Hydraulic  supply, 
pneumatic  supply,  and  control  system  failures  are  not 
considered  part  of  this  basic  event 


Engine  propellant  valves  (MOY,  MFY,  FPOY,  OPOV)  fail  in  a 
closed  or  flow  restricting  position  after  $RB  ignition.  These 
mechanical  failures  include  failures  of  the  valves  and  valve 
actuators.  Hydraulic  supply,  pneumatic  supply,  or  control 
system  failures  which  cause  spurious  valve  closure  are  not 
part  of  these  basic  events. 


Oxidizer  high  pressure  turbopump  ( HPOT ) heat  exchanger 
ruptures.  The  rupture  is  assumed  to  create  a breach  of  HPOT 
preburner  pressure  integrity. 


Main  combustion  chamber  burns  through  due  to  corrosion, 
random  factors,  or  excessive  chamber  temperature.  These 
events  are  outside  the  scope  of  analysis  and  are  undeveloped. 


POGO  ACCCOMRP  POGO  accumulator  leaks  or  ruptures.  The  leakage  will 

ACCIOMRP  probably  result  from  breach  of  pressure  boundary  at  or  near 

ACCROMRP  the  flanged  seal.  Other  material  migration  paths  include  the ' 

interface  with  the  helium  precharge  valves  and  RIVs. 
Ruptures  are  assumed  to  be  random  in  nature,  resulting  from 
a major  breach  in  any  part  of  the  accumulator  tank. 


BLOCORRG  Recirculation/isolation  valve  fails  ( mechanical  or 
BLOLORRG  structural ) in  POGO  suppression  system , resulting  in 
BLORORRG  regulation  failure  in  that  engine. 

Note:  Inability  to  maintain  L02  bleed/G02  recirculation  flow 
back  into  the  mam  oxidizer  feedling  would  cause  the  SSME 
POGO  accumulators  to  dump  excess  002  pressurcnt  into  the 
inlet  to  the  HPOT,  causing  possible  pump  cavitation  and 
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NOTES  TO  FIGURE  3"4 


Valve  Fault  Tree  Description  of  Basic  LMSC-F2230402 

Schematic  Mnemonic  Event  or  Sate 

Designation 


overspeed  with  potential  for  uncontained  engine  damage. 
(Ref.  7,  p.  10-3) 


GOX 

CNTL 

VLV 


BL0C06RG  Gas  control  valve  fails  (mechanical  or  structural ) in  the 
8L010GPG  pOGC  suppression  system , resulting  in  regulation  failure  in 
BLOROGRG  that  engine. A major  rupture  in  the  valve  will  cause  pump 

cavitation  and  subsequent  explosion. 


RELIEF 

RV5 

RV6 


PRVVFKOP  Liquid  propellant  overboard  relief  valves  fail  In  the  open 
PRVYOKOP  position,  releasing  propellant  at  an  uncontrolled  rate 

overboard  and  diverting  flow  en  route  to  the  main  engines. 
These  failures  are  mechanical  malfunctions  of  the  relief 
valves. 


ISOL. 

VLV 

PV7 

PY8 


PNYVONCD  Isolation  valves  upstream  of  the  liquid  propellant  overboard 
PNVYFNCD  relief  valves  fail  to  close  on  demand  That  is,  given  that  the 
overboard  relief  valve  fails  open,  these  pneumatically 
actuated  valves  fail  to  isolate  the  flow.  These  failures  are 
mechanical  failures  associated  with  valve  or  valve  actuator 
malfunction  Spurious  signals  which  prevent  the  valve  from 
closing  are  not  included  in  this  basic  event. 
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Valve 

Schematic 

Designation 


Fault  Tree 
Mnemonic 


NOTES  TO  FIGURE  3-4 

Description  of  Basic 
Event  or  Gate 


POGO  ACCCOMRP  POGO  accumulator  leaks  or  ruptures.  The  leakage  will 

ACCLOMRP  probably  result  from  breach  of  pressure  boundary  at  or  near 

ACCROMRP  the  flanged  seal.  Other  material  migration  paths  include  the ' 

interface  with  the  helium  precharge  valves  and  RlVs. 
Ruptures  are  assumed  to  be  random  in  nature,  resulting  from 
a major  breach  in  any  part  of  the  accumulator  tank. 


R1Y  6LOCORRG  Recirculation/isolation  valve  fails  ( mechanical  or 

BLOlORRG  structural ) in  POGO  suppression  system , resulting  in 
B10R0RRG  regulation  failure  In  that  engine. 

Note:  Inability  to  maintain  L02  bleed/G02  recirculation  flow 
back  into  the  main  oxidi2er  feedling  would  cause  the  SSME 
POGO  accumulators  to  dump  excess  GG2  pressurant  into  the 
inlet  to  the  HPOT . causing  possible  pump  captation  and 
overspeed  with  potential  for  uncontained  engine  damage. 

(Ref.  7,p.  1 0-3) 


GOX 

CNTl 

VLV 


BIOCOGRG  Gas  control  valve  falls  ( mechanical  or  structural ) in  the 
BIOLOGRG  POGO  suppression  system , resulting  in  regulation  failure  in 
BLOROGRG  that  engined  major  rupture  in  the  valve  will  cause  pump 
cavitation  and  subsequent  explosion. 


RELIEF  PRWFKOP 

RV5  PRYYOKOP 

RY6 


Liquid  propellant  overboard  relief  valves  fail  in  the  open 
position,  releasing  propellant  at  an  uncontrolled  rate 
overboard  and  diverting  flow  en  route  to  the  main  engines. 
These  failures  are  mechanical  malfunctions  of  the  relief 
valves. 


ISOL. 

VIY 

PY7 

PY3 


PNYYONCD  Isolation  valves  upstream  of  the  liquid  propellant  overboard 
PNVYFNCD  relief  valves  fall  to  close  on  demand.  That  is,  given  that  the 
overboard  relief  valve  falls  open,  these  pneumatically 
actuated  valves  fall  to  isolate  the  flow.  These  failures  are 
mechanical  failures  — -*  ;at8d  with  valve  or  valve  actuator 
malfunction.  Spurious  signals  which  prevent  the  valve  from 
closing  are  not  included  in  this  basic  event 
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Section  4 

QUANTITATIVE  EVALUATION 


4.  J Sr  "EM  AMD  COMPONENT  FAILURE  RATES 

Component  failure  rates  and  associated  exposure  times  are  used  to  calculate 
failure  probabilities  for  ccntinousiy  operating  components.  Each  oasic  evert 
failure- rate  depends  on  its  ccmoonent  type,  failure  mode,  operating  mode,  and 
environmental  aoplication.  The  results  of  this  data  compilation  effort  are 
summarized  in  Appendix  C.  The  rationale  For  the  established  failure  rates  is 
similarly  included  in  the  data  summary. 

The  generic  failure  rates  m Appendix  C,  Table  Ol  are  used  in  the 
development  of  individual  basic  event  failure  rates.  That  is,  a specific 
failure  rate  or  set  of  failure  rates  are  derived  for  each  basic  event  in  the 
fault  tree.  The  data  base  derived  from  the  generic  failure  rate  and  used  in 
fault  tree  computations  is  also  provided  in  Appendix  C,  Because  much  of  the 
failure  rate  data  is  t ime-dependent  or  conditional,  several  sets  of 
probabilities  are  calculated.  The  rationale  behind  the  use  of  time-dependent 
data  is  contained  in  Section  6* 

Structural  failures,  such  as  failure  of  the  ET  within  its  design  envelope, 
are  not  included  within  scope.  Such  failures  are  shown  for  modeling 
completeness  on  the  master  fault  trees  but  are  not  quantified  in  the  final 
conput at  ions . 


4.2  HUMAN  ERROR 

Man-machine  interactions  are  examined  in  two  distinct  ways:  M as  a source 

introducing  potential  risk  due  to  human  error,  and  2 ) as  a means  of  recovering 
from  system  faiLures  or  reducing  an  existing  hazardous  condition  through 
corrective  action. 

From  T-10  seconds  to  T+8  minutes,  human  actions  (either  introducing  or 
recovering  errors  ) become  secondary  to  automatic  controls.  In  contrast,  ground 
operation  errors  may  result  in  delayed  sources  of  catastrophic  accidents  if 
flight  scrub  safeguards  fail  to  detect  the  error.  Delayed  effects  include  those 
failures  which  do  not  manifest  themselves  until  flight;  for  example  a latent 
ground  error  will  not  cause  an  accident  until  after  SSME  ignition. 

Tabic  OZ  summarizes  the  various  Human  failure  rates  per  task  or  specific 
operation.  During  ground  fill  operations,  the  human  is  assumed  to  provide  only 
backup  and  status  monitoring  functions.  Fill  operations  are  assumed  to  be 
software  controlled  and  fully  automatic.  For  all  tasks  in  which  the  operator 
takes  responses  to  a software  or  hardware  failure  it  is  assumed  that  the 
operators  follows  written  procedures  and  that  there  is  at  least  one  other 
independent  check.  Other  major  assumptions  regarding  human  errors  and  response 
charac terist ics  are  provided  in  Appendix  C,  Table  03. 
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4.3  FAIUJFE  FRC6A9 ILI TV  CALCULATIONS 

The  t-:p  event  probability  u*5  determined  using  the  CAFTA  code  dev  eioced  by 
roiercs  Applicai  ions  Internet  ional  Corobrat  l-:-r . A Seecr  ;ct  ion  of  t he  code  and 
^•5  vi  ^ *:  *n  o d or  'generating  outlets  enc  £ ,-ent  pro&ebi  i . t ies  from  failure  rate  erg 
e-oosurs  i5  provided  in  Section  3.3  end  in  App-enci.*  I.  The  Failure  rate  data 
aesoribed  ir  ftpp endi*  C were  used  to  determine  tne  input  < basic  evert  5 
Cr't09Bi  i : tisi  r or  tne  rode.  The  probability  was  epora.  lnated  by  the  product  c-  F 
the  failure  rate  • • and  the  e.-.posur*  time  1 ).  This  approx  imat  ion  is  fairly 
accurate  fjr  pr;-bab  i 1 1 1 1 e=  below  .001.  This  is  the  case  ten  all  basic  events  m 
* ne  rSihF  fault  tree  aiocel. 

Timing  of  a catastrophic  failure  is  very  important  in  determining  the  total 
impact  on  the  STS,  STS  creu , and  surrounding  Facilities  and  personnel . 

Therefore,  different  tine  intervals  were  defined  to  account  for  various  system 
conf igurat ions  and  c onsequencee . In  addition,  the  exposure  time  varies 
depending  on  the  functional  requirements  of  the  component  (i.e.,  carta  in 
component!  can  only  induce  catastrophic  failures  during  specific  time  intervals, 
operations  or  system  configurations).  in  this  manner,  the  exposure  time  is  set 
as  the  longest  period 'of  time  in  unich  the  postulated  basic  event  failure  can 
occur.  Components  which  are  inactive  (until  required  to  operate)  assume  the 
entire  time  duration  from  the  start  of  launch  as  the  exposure  time?  this 
assumption  errors  in  the  direction  of  conservation.  A detailed  description  of 
the  time  intervals  and  the  basis  for  those  intervals  is  described  in  Section  S. 

The  sum  oF  top  event  probabilities  for  each  of  the  mutually  exclusive  time 
intervals  yields  the  total  probability  of  a catastrophic  failure  during  the 
period  T-8  hours  to  ET  separation. 

Most  of  the  failure  probability  occurs  during  the  time  interval  T-lC 
seconds  to  :ero  thrust.  Any  failures  which  result  from  component  leakages,  ice 
plugging,  or  related  failures  will  most  likely  be  realized  during  the  early 
seconds  of  flight.  Subsequent  portions  of  the  flight  are  important 
contributors , but  add  only  a fractional  contribution  to  the  probability  of 
catastrophic  failure. 

It  is  important  to  note,  however,  that  the  initial  calculation  of  the  top 
event  is  based  on  point  estimate  values  for  each  of  the  basic  events.  In  other 
words,  the  input  probabilities  do  not  carry  with  them  information  regarding 
their  statistical  distribution.  The  top  event,  therefore,  does  not  contain  any 
information  regarding  Its  distribution. 

4.4  SENSITIVITY  ANALYSIS 

Because  of  the  unavailability  of  shuttle  “ specific  failure  data,  generic 
failure  data  was  substituted.  The  subsequent  uncertainty  introduced  by  using 
generic  data  may  be  addressed  through  sensitivity  analysts.  In  sensitivity 
analysis,  the  probabilities  assumed  for  certain  basic  events  are  varied  and  the 
effect  on  the  top  event  probability  noted. 

Two  commonly  used  sensitivity  techniques  a.  e p<*»  one  it  iu  variation  and  monte 
Carlo  simulation.  A brief  discussion  of  how  each  technique  is  used  and  the 
results  produced  by  each  approach  is  provided  below. 
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Par-metr • c Var ; at : :n 

“ha  rjjt  straignt  forward  mean*  of  perfcrmcng  a sensitivity  analysis  is  >:  y 
changing  : ns  css::  event  probabilities  for  those  events  of. highest  importance 
ram  :ng.  In  acsition,  sensitivity  analysis  are  performed  for  components  at' 
special  interest  to  des lgn/ana 1 ysi 5 teams. 

The  sensitivity  analysis  is  based  on  varying  the  following  parameters: 

0 Reduce  all  seal  failure  rates  to  the  202  lower  confidence  interval 
value  as  dictated  by  the  Failure  rate  data  base* 

o Change  undeveloped  event  VENTPANEL  orbiter  aft  compartment  vent 

pane i / door } . Assume  successful  relief  of  pressure  {following  gross 
helium  leakage  or  pneumatic  system  component  rupture)  9 02  of  the  time. 

o Change  the  availability  of  ignition  source  following  gross  leakage 
of  propellant  in  aft  compartment  form  1.0  to  0.1* 

0 Reduce  bleed  valve/anti-f lood  valve  failure  rate  to  the  lower  902 
confidence  limit  established  by  the  failure  rate  data  base. 

0 Increase  heat  exchanger  rupture/gross  leakage  Failure  rate  by  one 

order  of  magnitude  (i.a.,  multiply  failure  rate  by  10).  NOTE:  This 

great  variation  in  failure  rate  is  tc  illustrate  the  insensitivity  of 
the  top  event  to  changes  in  the  heat  exchanger  failure  rate. 

The  sensitivity  parameters  are  Changed  individually t and  then  changed 
collectively,  The  results  ere  presented  in  Table  4-1. 

Synthetic  Sampling 

A second  method  of  assessing  the  risk  model's  sensitivity  to  changes  in 
basic  event  failure  probabl i i t ies  is  throughout  the  use  of  synthetic  sampling 
techniques  such  as  Monte  Carlo.  Monte  Carlo  relies  on  the  generation  of  random 
sample  of  basic  event  failure  probabi 1 it ies  from  appropriate  dis tributors . For 
each  set  of  basic  event  probabilities,  a top  event  is  calculated.  If  sufficient 
sample  <e.g.  sets  of  basic  event  probabilities)  are  drawn,  a distribution  may  be 
determined  for  the  top  event.  More  details  regarding  this  technique  provided  in 
Appendix  J. 


Various  distribut ions  were  used  to  represent  the  basic  event  failure 
probabilities  to  assess  sensitivities  in  the  fault  tree  model.  The  resulting 
top  event  distribution  exhibited  that  902  of  the  top  event  probabilities  were 
expected  to  occur  between  1 .40E-04  and  4.50E-03.  This  range  was  compute  d by 
TEMAC  code  using  a ‘Latin  Hypercube"  sampling  algorithm.  The  computed  point 
estimate  value  in  CAFTA  was  2.20C-03.  The  range  confirms  that  variations  in  top 
event  probability  are  subtantially  small.  This  Increases  the  confidence  in  that 
point  estimate  data  used  in  the  analysis  gives  a fairly  accurate  represent ct ion 
of  expected  system  performance. 


Other  TEMAC  computer  run  was  presented  in  Appendix  J for  general  reference. 
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4,5  IMPORTANCE  CALCULATION  OF  DOMINANT  EVENTS 

T he  CAFTn  auto-mat  loail/  computes  five  leisures  o f ouan  1 1 tat  i.  . e 

i ^cirtance  and  one  qual  1 t 1 lave  Measure  of  structural  importance.  The  results  o* 
the  C Ar Th  Importance  computations  are  presented  m Append:  *.  I for  a:I  events 
which  appear  in  cutsets  whose  probability  is  greater  than 

By  fo'jsirg  or  those  events  which  are  the  dormant  contributors  to  top 
event  probability,  according  to  their  importance  ranking,  one  can  prioritise 
these  design  effort*  which  reduce  risk  most  effectively.  Two  measures  or 
importance  ere  selected  to  illustrate  this  technique.  Fasse 1 l ^Lesely  i: 
selected  to  represent  a quantitative  measure  (e.g.  importance  is  based  on  the 
assigned  probabilities*.  Qualitative  or  structural  importance  15  also 
summarized.  The  results  of  these  computations  are  presented  in  Tables  4-2a  and 
4-2b  for  the  Fussel l-Uesely  and  structural  measures,  respectively.  The 
expressions  used  to  compute  these  values  are  discussed  in  Section  J.3.2. 

4,5.}  Resul t 5 of  Fussell-Uesely  Importance  Ranking 

Fu  5 se l-Vese 1 y Importance  was  computed  for  ail  basic  events  appearing  in 
cuts  sets  of  probability  10  -8  or  greater. 

The  highest  ranking  basic  event  is  leakage  through  L02  system  seals.  This 
is  closely  followed  by  failure  of  aft  compartment  3-uay  solenoid  piLot  valves 
which  control  orhiter  valves-  The  " vent-io-port * failure  mode  of  the  valves  nay 
overpressunre  the  aft  compartment-  l f the  vent  door  fails  to  relieve 
overpressure  condition.  Most  other  major  terms  are  associated  with  leaks  or 
ruptures  in  the  balance  of  L0Z/LH2  system  components,  seals  and  piping. 

4,5.2  Results  of  Structural  Importance  Ranking 

It  is  necessary  for  computational  purposes,  to  limit  structural  ranking  to 
those  basic  events  uhich  appear  in  sequences  above  3.S  E -05 , cr  the  presence  of 
too  many  singletons  immediately  below  this  truncation  limit  causes  the 
importance  value  to  approach  zero.  It  is  important  to  observe  that  those  basic 
events  which  appear  in  singleton  cut  sets  are  all  given  a structural  importance 
of  T.  This  is  an  inherent  limitation  of  structural  importance,  but  the  measure 
dees  provide  some  insight  for  those  dominant  basic  events  appearing  above  the 
truncation  limit. 

The  highest  ranking  basic  events  are  CNDEZXI6  and  VENTPANEL  which 
corresponds  to  the  presence  of  an  ignition  source  in  the  aft  compartment  and  the 
ability  of  the  aft  compartment  torelleve  overpressure . respec t : vely . These 
basic  events  appear  in  most  of  the  cutsets  above  the  truncation  limits  and  are 
expected  to  be  very  high  in  structural  importance. 

Loss  of  HPOT  seal  purge  due  to  gross  depressurizat ion  of  the  pneumatic 
control  assembly  also  ranks  high  in  structural  importance-  Ail  remaining  basic 
events  rank  equally  since  they  all  appear  in  one  and  only  one  doubleton  cutset. 
These  events  are  primarily  related  to  propellant  leaks  or  component  ruptures 
within  the  aft  compartment. 


ORIGINAL 
OF  POOR  QUAUTT 


SENSITIVITY 


Description  of  Boole 
Event  Che  age 


Seel  failure  rate  reduced  to  2Q%  lover 
confidence  interval  valve  of  1 .71 E - 5 
fail  urea/hour  (previously  2.09E-5 
failure/hours) 


Probability  of  orbiter  vent  panel  failing  to 
relieve  pressure  on  demand  * 0. 1 instead  of  1 .0. 
This  condition  follows  gross  leakage  or 
component  rupture  in  helium  pneumatic  system. 


Change  probability  of  ignition  given  a major 
propellant  spill  in  the  orbiter  from  1.0  to  □. 
This  affects  basic  event  CNDEZX1G. 


Reduce  bleed  valves  and  anti -flood  valve  fail  ure 
rate  by  one  order  of  magnitude. 


Increase  failure  rate  of  all  electrical /elec- 
tronic equipment  by  a factor  of  1 0 


Pneumatic  regulator  failure  rata  reduced  from 
1.1 6E  -4failures/hour  to  1.10  E -4 
failures/hour. 


Increase  heat  exchanger  failure  rate  by  one 
order  of  magnitude. 


Collective  change  of  items  1 ) thru  6). 


4-1 


1.14E-3 


-SZ% 
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SUMMARY  OF  HIGHEST  RANKING  BASIC  EVENT 
IMPORTANCE  VALUES 


PAGE  1 


o USING  FUSSELL-VESELY  MEASURES  o 


Sasic  Event 

Description 

Importance 

YENTPANEL 

VENT  DOOR  ON  ORBITER  AFT  COMPARTMENT 
FAILS  TO  RELIEVE  PRESSURE  WHEN 
COMPARTMENT  OVERPRESSURIZATION  OCCURS. 
THIS  EVENT  IS  APPLICABLE  DURING  ALL  PHASES 
OF  FLIGHT.  THE  MECHANISMS  AND  DETAILS 
REGARDING  FAILURE  REMAIN  UNDEVELOPED. 

3.SE-01 

CNDE2X1G 

AN  IGNITION  SOURCE  IS  PRESENT  TO  IGNITE 
PROPELLANT  RELEASED  WITHIN  THE  ENGINE 
COMPARTMENTS.  THE  PRIMARY  SOURCE  OF 
IGNITION  ARE  THE  HOT  SURFACES  OF  THE  SSME 
PREBURNERS.  AND  HOT  GAS  MANIFOLD. 

2.0E-01 

FLGEOSLK 

FLANGE  FAILURES  IN  THE  L02  SYSTEM  WITHIN 
THE  MAIN  ENGINE  COMPARTMENT  RESULT  IN 
LEAKAGE  THROUGH  SEALS.  LEAKAGE  THROUGH 
SEALS  COMBINED  WITH  AN  IGNITION  SOURCE  IS 
ASSUMED  TO  BE  CATASTROPHIC. 

6.6E  -02 

CN0VZXI6 

AN  IGNITION  SOURCE  IS  PRESENT  TO  IGNITE 
PROPELLANT  LEAKS  WITHIN  THE  ORBITER. 
SOURCES  OF  IGNITION  HAVE  NOT  BEEN 
IDENTIFIED  FOR  LEAKS  IN  THIS  LOCATION. 

45E-02 

SPWPXDP 

PILOT  VALVES  ON  ORBITER  PNEUMATIC 
ACTUATORS  SPURIOUSLY  VENT  TO  PORT. 
DEPRESSURIZING  PNEUMATIC  SUPPLY  TO  THE 
ACTUATORS  AND  RENDERING  VALVE  CONTROLS 
INOPERATIVE.  THIS  BASIC  EVENT  REPRESENTS 
THE  SUM  TOTAL  OF  ALL  PILOT  VALVES  ON 
MANIFOLDS  FOR  ORBITER  VALVE  ACTUATORS. 

4.4E-02 

MPBVP3LK 
MP8VPSLK 
MP8VP ILK 

i 

n 

GROSS  LEAKAGE  THROUGH  COMPONENT  SEALS 
CAUSE  DEPRESSURIZATION  OF  THE  HELIUM 
SUPPLY  SYSTEM.  LEAKAGE  CONFINED 
INTERNALLY  TO  THE  COMPONENT  IS  NOT 
INCLUDED  IN  THESE  BASIC  EVENTS.  THE  BASIC 
EVENT  REPRESENTS  THE'  SUM  TOTAL  OF  WELDS 
IN  A SECTION  OF  HELIUM  SYSTEM  PIPING  A3 
DESCRIBED  IN  THE  FAULT  TREE. EACH  VALVE 
WAS  ASSUMED  TO  HAVE  ONE  SEAL. 

4.  IE-02 
4.0E-02 
4.0E-02 
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SUMMARY  OF  HIGHEST  RANKING  BASIC  EVENT 
IMPORTANCE  VALUES 


O USING  FUSSELL-VESELY  MEASURES  0 
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Basic  Event 

Description 

Importance 

MPBEOSLK 

COMPONENTS  SUCH  AS  VALVES  AND  RELIEF 
DEVICES  ON  THE  OXIDIZER  SYSTEM  WHICH  ARE 
LOCATED  IN  THE  ENGINE  COMPARTMENT 
RUPTURE  OR  LEAK.  RUPTURE  IS  SUFFICIENT  TO 
CAUSE  MAJOR  LOSS  OF  L02. 

3.SE-02 

FLGEFSLK 

FLANGE  FAILURES  IN  THE  LH2  SYSTEM  WITHIN 
THE  MAIN  ENGINE  COMPARTMENT  RESULT  IN 
LEAKAGE  THROUGH  SEALS.  LEAKAGE  THROUGH 
SEALS  COMBINED  WITH  AN  IGNITION  SOURCE  IS 
ASSUMED  TO  BE  CATASTROPHIC. 

3.  IE-02 

MPBEFSLK 

- - -- 

COMPONENTS  SUCH  AS  VALVES  AND  RELIEF 
DEVICES  OF  THE  LH2  SYSTEM  WHICH  ARE 
LOCATED  IN  THE  ENGINE  COMPARTMENT  LEAK 
OR  RUPTURE.  RUPTURE  IS  SUFFICIENT  TO  CAUSE 
MAJOR  LOSS  OF  LH2. 

2.3E  -02 

MPBVOSLK 

MPBYFSLK 

COMPONENTS  SUCH  AS  VALVES  AND  RELIEF 
DEVICES  ON  THE  PROPELLANT  SYSTEM  WHICH 
ARE  LOCATED  IN  THE  ORBITER  RUPTURE  OR 
LEAK.  RUPTURE  IS  SUFFICIENT  TO  CAUSE 
MAJOR  LOSS  OF  PROPELLANT  FROM  EITHER  THE 
L02  OR  LH2  SYSTEMS. 

2.  IE-02 
2.  IE-02 

MPBEOPRP 

PROPELLANT  SYSTEM  PIPING  ASSOCIATED 
WITH  THE  L02  SYSTEM  IN  THE  MAIN  ENGINE 
COMPARTMENT  RUPTURES.  THEREBY 
RELEASING  LIQUID  INTO  THE  ENGINE 
COMPARTMENT.  RUPTURE  IS  SUFFICIENT  TO 
CAUSE  MAJOR  LOSS  OF  PROPELLANT  FROM 
THE  L02  SYSTEM. 

2.0E-02 

FLGEJSIK 

FLANGE  FAILURES  IN  THE  GH2  SYSTEM  WITHIN 
THE  MAIN  ENGINE  COMPARTMENT  RESULT  IN 
LEAKAGE  THROUGH  SEALS.  LEAKAGE  THROUGH 
SEALS  COMBINED  WITH  AN  IGNITION  SOURCE  IS 
ASSUMED  TO  BE  CATASTROPHIC. 

1 .8E-02 
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SUMMARY  OF  HIGHEST  RANKING  BASIC  EVENT  page 3 

IMPORTANCE  VALUES 

O USING  FUSSELL-VESELY  MEASURES  o 


Basie  Event 

Description 

importance 

SPYLPCDP 

LEAKAGE  OR  RUPTURE  CAUSES  SOLENOID 

1 .6E-02 

SPYCPCDP 

VALVES  IN  THE  PNEUMATIC  CONTROL  ASSEMBLY 

1 .6E-02 

SPVRPCDP 

(PCA)  TO  DEPRESSURIZE  THE  HELIUM  SYSTEM. 
DEPRESSURIZATION  MAY  OCCUR  THROUGH 
CRACKS  IN  THE  VALVE  WALLS  OR  THROUGH  THE 
VALVES4  WELDED  CONNECTIONS  TO  PCA  PIPING. 

1 .6E-02 

BOPEFXRP 

BURST  DIAPHRAGM  LEAKS  Oft  PREMATURELY 

1 .6E-02 

BDPEOXRP 

RUPTURES  SO  AS  TO  CAUSE  PROPELLANT 
SYSTEM  PRESSURE  BOUNDARY  FAILURE.  THIS 
FAILURE  CAN  OCCUR  IN  BOTH  THE  102  AND  LH2 
SYSTEMS. 

1 .6E-02 

MPBEUPRP 

PROPELLANT  SYSTEM  PIPING  ASSOCIATED 
WITH  THE  6H2  SYSTEM  IN  THE  MAIN  ENGINE 
COMPARTMENT  RUPTURES.  RUPTURE  IS 
SUFFICIENT  TO  PREVENT  LH2  TANK 
PRESSURIZATION. 

1 .56  -02 
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TABLE  4- 2b 

SUMMARY  OF  HIGHEST  RANKING  BASIC  EVENT 
IMPORTANCE  VALUES 

o USING  STRUCTURAL  MEASURES  ❖ 


8astc  Event 

Description 

Importance 

CNQEZXiG 

CONOITONAL  PROBABILITY  THAT  AN  IGNITION  SOURCE  IS  PRESENT 
TO  IGNITE  PROPELLANT  RELEASED  WITHIN  THE  MAIN  ENGINE  THE 
PRifiAflY  SOURCE  Of  IGNITION  IS  HOT  SURFACES  OF  SSflE 
PREBURNERS.  COMBUSTION  PRODUCT  EXHAUST  PIPING.  ETC. 

1.10E  -02 

VENTPANEL 

VENT  DOOR  ON  ORBITER  AFT  COMPARTMENT  FAILS  TO  RELIEVE 
PRESSURE  WHEN  COMPARTMENT  OVERPRESSURIZATION  OCCURS. 

4.40  £ -03 

FLGEJSLK 

FLAN6E  FAILURES  RESULT  IN  LEAKAGE  THROUGH  6H2  SYSTEM 
SEALS.  LEAKAGE  THROUGH  SEALS  COMBINED  WITH  AN  IGNITION 
SOURCE  IS  ASSUMED  TO  BE  CATASTROPHIC.  THIS  BASIC  EVENT 
REPRESENTS  THE  SUM  TOTAL  OF  FLANGE-RaATEO  FAILURES 
WITHIN  THE  AFT  COMPARTMENT. 

J 

2.97  €-03 

5PVCPCDP 

SPVIPCDP 

SPVfiPCOP 

LEAKAGE  OR  RUPTURE  CAUSES  SOLENOID  VALVES  IN  THE 
PNEUMATIC  CONTROL  ASSEMBLY  (PCA)  TO  DEPRESSURIZE 
THE  HELIUM  SYSTEM. 

i 

2.57  E -03 
(EACH) 

CN0VZXI6 

CONDITIONAL  probability  that  an  ignition  source  is 
PRESENT  TO  INGNITE  PROPELLANT  RELEASED  WITHIN  THE  ORBITER 
FUSELAGE . SOURCES  OF  IGNITION  HAVE  NOT  BEEN  lOENTIFtEO  FOR 
LEAKS  IN  THIS  LOCATION. 

1.58  e -03 

BOPEFXRP 

80PEOXPP 

BURST  DIAPHRAGM  LEAKS  CR  PREMATURELY  RUPTURES  SO  AS  TO 
CAUSE  PROPELLANT  SYSTEM  PRESSURE  BOUNDARY  FAILURE.  THIS 
FAILURE  CAN  OCCUR  IN  BOTH  THE  L02  AND  LH2  SYSTEM. 

9.50  E -4 
(EACH) 

FLGEFSLK 

FLGEOSLK 

FLANGE  FAILURES  RESULT  IN  LEAKAGE  THROUGH  SEALS.  LEAKAGE 
THROUGH  SEALS  COMBINED  WITH  AN  IGNITION  SOURCE  IS  ASSUMED 
TO  BE  CATASTROPHIC.  THESE  BASIC  EVENTS  REPRESENT  THE  SUM 
TOTAL  OF  FLANGE -RELATED  FAILURES  OF  BOTH  THE  UQUO  FUEL 
AND  OXIDIZER  SYSTEMS  WITHIN  THE  AFT  COMPARTMENT. 

9.90 1 
(EACH) 
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TABLE  4- 2b 

SUMMARY  OF  HIGHEST  RANKING  BASIC  EVENT 
IMPORTANCE  VALUES 

o USING  STRUCTURAL  MEASURES  * 

pafle  2 


8asic  Event 


Description 


Importance 


MP8EFSLK 

npaeosu 


CCMPONENTS  SUCH  AS  VALVES  AND  RELIEF  0EVICE5  ON  THE  9 90  t -04 

PROPELLANT  SYSTEM  WHICH  ARE  10CATE0  IN  THE  MAIN  (EACH) 

ENGINES  RUPTURE  OR  LEAK.  RUPTURE  IS  SUFFICIENT  TO  CAUSE 
MAJOR  LOSS  CF  PROPELLANT  FROM  EITHER  THE  LQ2  OR  LH2 


SYSTEM. 


MP8E0PRP 


PROPELLANT  SYSTEM PIPIN6  ASSOCIATED  WITH  THE  MAIN 
ENGINE  RUPTURES.  THEREBY  RELEASING  LIOLHO  OXIDIZER  INTO 
THE  AFT  COMPARTMENT.  RUPTURE  IS  SUFFICIENT  TO  CAUSE 
MAJOR  LOSS  CF  PROPELLANT  FROM  THE  L02  SYSTEM. 


9.90  E -04 


MPBVFSLK 

MPBV05LK 


CCMPONENTS  SUCH  AS  VALVES  AND  RELIEF  DEVICES  ON  THE 
PROPELLANT  SYSTEM  WHICH  ARE  LOCATED  IN  THE  CR8ITER 
RUPTURE  OR  LEAK.  RUPTURE  IS  SUFFICIENT  TO  CAUSE  MAJOR 
LOSS  OF  PROPELLANT  FROM  EITHER  THE  L02  OR  LH2  SYSTEM. 


9.90  E -04 
(EACH) 
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GROSS  LEAKAGE  THROUGH  COMPONENT  SEALS  CAUSES 
DEPRESSURIZATION  OF  THE  HELIUM  5UPPIY  SYSTOI. 
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PILOT  VALVES  ON  CRSITER  PNEUMATIC 
ACTUATORS  SPURIOUSLY  VENT -TO  PORT. 
0EPPE5SURIZ1N6  PNEUMATIC  SUPPLY  TO  THE 
ACTUATORS  AMO  RENDERING  VALVE  CONTROLS 
INOPERATIVE. 
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Section  5 

SYSTEMS  DESCRIPTION 
' Ref srence  12 ) 


'he  KPF5  furnishes  the  qrejsurant  gas  at  the  conditions  necessary  for 
proper  operation  of  the  Mem  Propulsion  System  <MPS)  from  the  beginning  of 
grounc  operations  until  successful  return  of  the  Shuttle  Orbiter  to  Earth.  The 
MPPE  consists  of  the  external  tank  < ET  ) , the  Space  Shuttle  ram  engine  tSSMEt, 
ana  those  components  of  the  Orbiter  which  connect  the  ET  to  the  SSME  and  provide 
the  necessary  services  for  safe  operation  within  the  normal  Space  Transportation 
3y 3 tern  (STS)  requirements.  The  system  also  includes  the  facility  equipment  and 
the  ground  support  equipment  (6SE)  necessary  for  servicing  the  Helium 
P.-essuri  ration  System  and  providing  the  necessary  ET  fuel  and  oxidizer 
preasur i ;at ion  prior  to  SSME  igniticn.  The  MPPS  must  ailou  the  SSME  to  shut 
down  safely  during  normal  operation  and  protect  the  engine  from  catastrophic 
damage  when  malfunctions  within  the  engine  or  m any  supporting  system  are 
^el®dted.  Where  possible,  the  system  must  provide  backup  to  mal funct ioning 
systems  and  ailou  continuation  of  a mission  or  allow  the  mission  to  be  safely 
aborted. 

The  MPPS  is  designed  to  provide  pressurization  services  from  prior  to  SSME 
ignition  throughout  ascent  and  insertion.  Pressurization  services  terminate 
with  successful  separation  of  the  ET  and  purge  of  the  residual  propellents  in 
the  Orbiter  feed  lines  and  the  SSME.  ET  separation  occurs  approx imately  8 
minutes  after  lift-off  at  the  vehicle  velocity  state  vector  of  25,780  ft/sec  and 
an  altitude  of  E5  n.  mi.  The  MPPS  has  the  ability  to  overcome  failures,  which 
allows  successful  mission  completion  or  safe  abort,  depending  upon  the  time  at 
which  a failure  occurs.  The  mission  abort  modes  and  strategies  will  be 
discussed. 

This  section  was  adapted  from  Reference  22,  with  background  from  Ref.  G. 

fit  description  of  systems  operations,  physical  characteristics  and 
significant  failure  modes  is  provided  in  the  paragraphs  below. 


S.t  LH2  ANO  102  PROPELLANT  FLOW  FUNCTIONS 

The  LH2  and  L02  systems  are  operationally  very  similar.  Both  systems  draw 
propellant  from  their  respective  external  tanks  and  both  direct  the  flow  through 
orbiter  pioing  into  the  main  engine  assembly.  The  systems  vary  slightly  in  1 ) 
external  tank/orbiter  interface  connections,  and  2)  provisions  for  PQ60 
suppression.  Other  differences  are  noted  in  Table  5-1. 

Figures  5-1a,  b,  and  5-2a , b,  are  simplified  schematics  illustrating  the 
main  propellant  process  flow  for  the  LH2  and  L02  systems  for  ground  and  flight 
configurations. 
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5.1.1  Propellant  Flow  Path 

During  flight,  propellant  la  drawn  from  the  external  tank  ( ET  ) through  an 
S r ' disconnect  valve.  The  flow  is  split  ecually  into  three  separate 

paths  fro*  a cannon  manifold.  Each  flow  path  corresponds  to  one  of  these  wain 
engi res . 

Prior  to  reaching  the  wain  engine,  each  propellant  f Lew' passes  through  a 
prevelve  ti.e.,  one  prevalve  per  flow  path  to  the  engine  >♦  The  prevalves  are 
Pi-stable  valves  m that  they  can  only  assume  a full  open  cr  full  closed 
position.  The  full  closed  position  serves  an  isolation  function.  When  full 
open,  e flow  path  to  the  mem  engine  is  maintained. 

Downstream  of  the  prevaives  , the  flow  enters  the  suction  side  of  a low 
pressure  turbo  pump.  A second  high  pressure  pump  draws  the  propellant  from  the 
low  pressure  pump  disenarge  and  directs  most  of  the  flow  towards  the  main  burner 
through  a nam  flow  control  valve.  The  main  process  flow  configuration  is 
schematically  identical  for  both  the  LH2  and  LG2  lines.  Ail  three  engines  are 
likewise  identical  in  configuration, 

5.1.2  Propellant  Pressure  Boundary 

The  ET  contains  the  liquid  propellants,  liquid  hydrogen  ( LH2  ) fuel  and 
liQuid  oxygen  <L02)  oxidizer  at  the  required  ratio  (approximately  G?1),  and 
supplies  them  with  the  proper  temperature,  density,  and  pressure  required  to 
prevent  punp  cavitation  to  the  three  Space  Shuttle  Min  engines  (SSflE’s)  from 
lift-off  through  ascent  to  main  engine  cutoff  (ttECQ).  After  MECO , the  ET  is 
jettisoned,  enters  the  Earth’s  atmosphere,  and  impacts  in  a remote  area  of  the 
Indian  Ocean. 

The  LC2  tank  is  located  in  the  forward  part  of  the  ET.  It  contains 
approximately  1,351,230  pounds  of  liquid  oxygen.  The  tank  feeds  a 
1 7-mch-diamet er  feed  line  passing  from  the  bottom  of  the  tank  through  the 
intertank  structure,  then  external  to  the  aft  right-hand  ET/Orbiter  disconnect. 
The  mtertank  is  a cylindrical  structure  which  houses  the  ET  ins trumentat ion 
components  and  provides  an  umbilical  plate  that  interfaces  with  the  GSE  arm  for 
helium  gas  supply,  hazardous  Qaa  detection,  and  gaseous  hydrogen  boil  off  during 
prelaunch  operations.  The  LH2  tank,  which  is  located  aft,  contains 
approximately  227,550  pounds  of  liquid  hydrogen  and  supplies  fuel  through  a 
1 7- inch-diameter  feed  line  to  the  aft  left-hand  disconnect. 

Because  of  the  great  difference  in  density  of  fuel  and  oxidizer,  the 
hydrogen  tank  contains  one-sixth  the  total  weight  of  propellants  and  is 
approx imat ley  2,7  times  the  volume  of  the  L02  propellant  tank.  The  LOZ  tank  is 
located  forward  to  obtain  a favorable  center  of  gravity  (c.g.)  location  for  the 
entire  vehicle. 

The  Propellant  Piping  in  the  Orbiter  consists  of  manifolds,  distribution 
lines  and  valves  which  circulate  propellant  to  condition  the  system  and  route 
the  fuel  and  oxidizer  through  prevaives  to  each  of  the  three  SSME’s.  Tm* 
subsystem  also  consists  of  the  distribution  lines  and  valves  which  furnish 
preasurant  gas  to  the  ET  after  SSME  ignition  and  until  MECO. 
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5.1.3  Control  of  Major  Mechanical  Components 

l / e 5 uithn  the  *nain  process  flow  are  hy  draul  icai  1 y or  pneumat rcai 1 y 
or  both. 

Fl:w  u regulated  by  adjusting  the  speed  on  the  high  pressure  pump.  The 
high  pressure  cump  consists  of"  3 pre burner /tub ine  pump  Mechanise  from  LOZ  and 
LH2  e.-.  tract  ton  lines.  The  oxidizer  flow  is  adjusted  by  a throttling  valve,  a 
preburner  oxidizer  valve  uhich  controls  the  rate  of  comoustion  m the  punp 
preburner  and  thus  pump  speed.  The  high  pressure  pump  configurations  ere 
identical  for  the  LOZ  and  LHZ  systems. 

The  low  pressure  oxianer  pump  is  driven  by  an  extraction  line  from  the 
discharge  of  the  high  pressure  oxidizer  turbopump..  The  low  pressure  LH2 
turoopump  is  driven  off  the  main  engine  burner  combustion  pressure.  Exhausts 
from  the  turbopuMp  are  combined  with  the  exhausts  from  the  LH2  and  LC2 
preburners  and  recycled  into  the  main  engine  burner. 


5.1. 4 External  Tank  Pressuricat ion 

This  section  describes  the  pressurization  of  the  L02  and  LH2  tanks. 

Effects  of  failures  assume  that  all  three  engines  are  running.  Engine  out 
failure  modes  are  discussed  in  Section  5.5 

5. 1 .4.1  L02  Tank 

High  pressure  llQuid  oxyger  from  the  HPOT  is  fed  to  the  ItCC  and  the 
preburner  pump.  Small  quantities  are  bled  through  the  anti-flood  valve  <AFU>  to 
the  heat  exchanger  (HEX).  Part  of  the  resulting  gaseous  oxygen  < GOZ  ) is  used 
for  Pogo  suppression,  and  the  rest  is  routed  through  the  oxygen  FCU's  for 
pressur izat ion  of  the  ET  <Ref  17,  p.  1.2-1). 

The  L02  tank  preasurizat ion  line  provides  the  means  of  transporting  the 
pressurant  to  the  ullage  area  to  assure  the  required  LOZ  interface  pressure  and 
tank  pressure.  (Ref.  8,  p.  P-6)  Loss  of  ullage  pressurant  flow  from  the  period 
of  engine  start  to  one  second  after  lift-off  could  violate  the  18.3  psig  minimum 
ullage  pressure  requirement  at  lift-off  plus  one  second  causing  potential  tank 
damage  (shear  buckling)  with  the  potential  for  TPS  loss  and  eventual  tank 
rupture.  (Ref.  7,  p.  4-10A)  Loss  of  LOZ  ullage  pressure  during  flight  could 
result  in  ET  structural  failure. 

Oescribed  below  are  the  major  constituents  and  failure  modes  of  the  L02 
tank  pressur izat ion  subsystem. 

Comoonents  Forminq  G02  Pressure  Boundary 

Major  external  leakage  of  502  components  (line  segments,  flex  couplings, 
bellows,  seals)  may  cause  loss  of  602  and  possible  structural  failure  of  the  LOZ 
tank.  (Ref.  B,  p.  Pft— 6 ) 
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Failurs  of  one  502  engine  isolation  checfc  valve  to  open  would  prevent 
zr~z*Lxr i o?t  ion  gas  fron  tnat  engine  from  reaching  the  tank.  This  could  also 

oo35itie  ruat'-re  of  5SME  heat  exchanger  coil  allowing  mature  of  fuel  rich 
e*raus:  gas  anc  LOZ,  <Ref.  ’5r  p.  354  > 

Flo*  C Vaive 

Each  pressure  sensor  controls  a floe,  control  valve  for  one  of  the  three 
■orb  iter  mam  engines.  At  engine  start,  the  three  orbiter  flow  control  valves 
are  closed  since  the  tanks  are  pressurized.  (Ref.  8,  p.  E-9  j 

To  maintain  the  desired  ullage  pressure,  the  flow  control  valves  are 
automatically  opened  if  the  tank  pressure  drops  to  of  the  control  band.  There 
is  no  manual  control  for  the  L02  flow  control  valves.  (Ref.  17,  p.  2,1-4) 

Failure  of  a single  LOZ  pressurant  flow  control  valve  to  ooen  to  increase 
502  pressurant  flow  will  not  affect  the  system.  A second  valve  failing  closed, 
or  a G02  engine  isolation  check  valve  failing  closed  in  another  engine  will 
result  in  loui  ullage  pleasure,  possibly  violating  the  ET  structural  safety 
factor.  All  3 flow  controL  valves  failing  closed  may  result  in  ET  structural 
failure  and  loss  of  cr ew/vehic 1 e . <Raf.  18,  p.  365) 

A clogged  orifice  in  one  leg  of  a Flow  control  assembly  results  in  loss  of 
only  1/2  flow  capacity  of  one  valves.  Other  valves  will  maintain  adequate  ET 
L pressure.  <Ref.  IS,  p,367> 

Body  burn-through  of  a LOS  flow  control  valve  caused  by  impact  of  particles 
or  excessive  G02  temperatures  will  cause  loss  of  G02  pressurant  to  ET.  Release 
of  hot  GQ2  mto  orbiter  aft  bay  may  result  in  overpressur izat ion  and  orbiter 
structural  damage.  Hot  H02  impingement  may  cause  damage  to  surrounding 
system/component s * (Ref.  18,  p.  3S8 ) 

Disconnect  Valve 

The  LOZ  tank  pressurizat ion  disconnect  transmits  pressurant  flow  from  the 
Orbiter  to  the  external  tank  in  flight  and  from  the  ground  during  tank 
prepressur irat ion  operations*  The  ET/Orbiter  interface  consists  of  a 
2-in. -diameter  disconnect  valve.  The  disconnect  contains  coaxial  poppets  which 
are  held  open  mechanically  when  the  disconnect  halves  are  engaged  and  closed 
with  spring  force  once  disengaged.  Sealing  is  accomplished  by  metai-to-metal 
seal  with  internal  gas  pressure  assisting  the  ef feet iveness  of  the  seal.  The 
gas  trapped  between  the  two  poppet  closures  during  disengagement  is  allowed  to 
dump  freely-  After  umbilical  separation  the  Orbiter  haLf  of  the  disconnect 
serves  as  a closeout  for  the  main  engine  pressurization  system,  preventing 
contamination  of  this  system  during  atmospheric  exposure.  The  tank  half  of  the 
disconnect  prevents  loss  of  pressurant  from  the  tank,  minimizing  thrust  reaction 
on  the  tank  during  tank  separation  and  free  fall. 

External  leakage  caused  by  seal  fracture  can  possibly  reduce  ullage 
oressure.  This  reduction  is  not  sufficient  to  be  critical,  since  the  mating 
flange  design  restricts  the  flow  path  to  0.008  squared  inches  with  total  seal 
failure.  Failure  of  the  disconnect  to  remain  open  during  ascent  can  result  in 
possibLe  rupture  of  the  pressur i za t i on  line  and  low  L02  ullage  pressure.  This 
can  lead  to  possibly  early  L02  depletion  and  SSJ1E  shutdown.  There  is  a 
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possibility  of  the  loss  of  crew/vehicle  if  the  line  ruotures  ar.d  the  aft  bay 
ooooar inert  is  overpr-ssurizec.  Failure  cf  the  disconnect  to  close  curing  £7 
separation  will  result  in  contamination  of  the  Orbtter  pressurization  line 
during  reentry. 

Ci  f fuser  Assembly 

A cylindrical  ciffuser  is  located  internal  to  the  L02  tank  «t  the 
pressurization  line  outlet.  The  diffuser  is  eouippec  with  a perforated 
cylindrical  core  with  more  than  1500  holes.  The  external  portion  of  the 
diffuser  is  a mesh  screen.  The  diffuser  reduces  the  entrance  velocity  of  the 
incoming  pressurant  gas  to  provide  uniform  distribution  of  the  gases  in  the 
ullage.  A pressure  reduction  orifice  is  located  at  the  inlet  to  the  diffuser  to 
avoid  problems  with  high  hach  Number  flow  m the  pressurized  line.  (Ref,  8,  p. 


Structural  failure  of  the  diffuser  assembly  or  screen  could  cause  loss  of 
capability  to  diffuse  pressurant  flow  which  would  result  in  ullage  pressure 
collapse.  (Ref.  8,  p.  PA-7) 

Vent/Relief  Valve 

The  LQ2  vent/relief  valve  is  a normally  closed,  spring-loaded  valve  which 
is  actuated  open  by  Ground  Support  Equipment  ( GSE ) helium  prior  to 
prepressurization  and  launch. 


The  valve  is  held  open*during  loading  to  allow  the  escape  of  purge  and 
pressurant  gas  as  it  is  displaced  by  the  propellant  and  the  propellant  boil-off. 

In  the  event  that  the  tank  pressure  gets  too  high,  the  valve  will  relieve 
to  protect  the  tank  structure.  The  L02  relief  pressure  is  24  +-  I (plus  or 
minus  I)  psig.  The  L02  valve  vents  directly  to  the  atmosphere.  (Ref.  17 
p.  2.3-1  ) ' ' . 

valve  relief  mode  operations,  the  two  stage  valve  design  utilizes  a 
primary  sensing  pilot  and  a secondary  slave  pilot.  The  primary  pilot  uses  local 
ambient  pressure  as  a reference  pressure  (sensed  at  ambient  pressure  sense 
port).  The  primary  pilot  provides  control  so  that  valve  relief  will  occur.  The 
secondary  pilot  allows  flow  to  the  main  piston  in  response  to  signal  from  the 
^ primary  pilot  during  relief  operation.  The  primary  and  secondary  pilot  inlets 
are  connected  to  the  main  valve  inlet  cavity.  (Ref.  17,  p.2.3-1) 

The  inlet  of  the  vent/relief  valve  is  fastened  to  the  L02  tank  forward  dome 
ogive  coverplate.  The  valve  outlet  is  tee  connected  to  plenums  on  opposite 
sides  of  the  nose  fairings  to  provide  non-propulsive  venting.  (Ref.  8,  p.  P-7) 

Failure  of  the  vent/relief  valve  to  remain  closed  or  structural  failure  of 
the  valve  assembly  resulting  in  external  leakage  will  cause  loss  of  ullage 
pressure.  (Ref.  7,  pp . PA- 11  - PA- 18) 

The  LQ2  vent/relief  valve  position  indicator  switch  tolerance  allows  valve 
to  indicate  closed  when  it  may  be  open  up  to  0.30  inches.  This  condition  could 
allow  undetected  ullage  gas  leakage  prior  to  launch.  Hot  GQ2  may  autoignite  TPS 
during  flight. 
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S.  S . 4. . : LHZ  Tank 

C 5 Forming  GHZ  Pressur*  Soundsrv 

MajQr  external  leakage  of  GHZ  components  Cline  segments,  flex  couplings, 
oeiiows,  seeis?  me/  cease  loss  of  GHZ  and  possible  structural  failure  of  the  LHZ 
tank.  (Ref.  3,  c.  PA-46  > 

Failure  of  an  engine  isolation  check  valve  to  open  would  prevent 
prsssar iz3tion  gas  from  that  engine  from  reaching  the  tank.  Failure  of  a second 
check,  valve  or  a flow  control  valve  in  another  engine  will  result  in 
insufficient  LHZ  ullaQe  pressure.  (Ref.  18,  p.341) 

Flow  Control  Uaives 

Each  pressure  sensor  controls  a flow  control  valve  for  one  of  the  three 
orpiter  main  engines.  At  engines  start,  the  three  orbiter  flow  control  valves 
are  closed  since  the  tanks  are  pressurized  (Ref.  8,  p.'E-SU 

To  maintain  the  desired  ullage  pressure  , the  flow  control  valves  are 
automatically  opened  if  the  tank  pressure  drops  out  of  the  controL  band.  The 
LHZ  flow  control  valves  can  be  manually  opened  by  the  crew  if  necessary,  (Ref. 
17,  p.  2,1-4) 

Failure  of  a single  LH2  pressurant  flow  control  valve  to  open  to  increase 
GHZ  pressurant  flow  will  rot  affect  the  system,  n aeuunu  valve  failing  ciosea, 
or  a 6H2  engine  isolation  check  valve  failing  closed  in  another  engine  will 
result  in  insufficient  ullage  pressure  , resulting  in  3 SSME  shutdown.  All  3 
flow  control  valves  failing  closed  may  result  in  ET  structural  failure  and  loss 
of  crew/vehiclc.  (Ref.  18,  p.  338) 

A clogged  orifice  in  one  leg  of  a flow  control  assembly  results  in  loss  of 
'/Z  the  flew  capacity  from  one  engine.  Pressor izat ion  flow  from  the  other  two 
engines  will  maintain  adequate  £T  pressure.  (Ref.  18,  p.  340) 

Disconnect  Ualve 

The  LH2  tank  pressurization  disconnect  transmits  pressurant  flow  from  the 
Orbiter  to  the  external  tank  in  flight  and  from  the  ground  during  tank 
prepressurizat ion  operations.  The  ET/Orbiter  interface  consists  of  a 
2-in .•diameter  disconnect  valve.  The  disconnect  contains  coaxial  poppets  which 
are  held  open  mechanically  when  the  disconnect  halves  are  engaged  and  closed 
with  spring  force  once  disengaged.  Sealing  is  accomplished  by  metai-to-metal 
seal  with  internal  gas  pressure  assisting  the  effectiveness  of  the  seal.  The 
gas  trapped  between  the  two  poppet  closures  during  disengagement  is  allowed  to 
dump  freely.  After  umbilical  separation  the  Orbiter  half  of  the  disconnect 
serves  as  a closeout  for  the  main  engine  pressurization  system,  preventing 
contamination  of  this  system  during  atmospheric  exposure.  The  tank  half  of  the 
disconnect  prevents  loss  of  pressurant  from  the  tank,  minimizing  thrust  reaction 
on  the  tank  during  tank  separation  and  free  fall. 

External  leakage  caused  by  seal  fracture  can  possibly  reduce  ullage 
pressure.  This  reduction  is  not  sufficient  to  be  critical,  since  the  mating 
flange  design  restricts  the  flow  path  to  0.008  square  inches  with  total  seal 
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failure.  Failure  of  the  disconnect  to  remain  ooen  during  ascent  can  result  :n 
possible  rupture  of  the  prassur  izat  ion  line  ana  toy  LH2  ullage  pressure.  Tms 
can  Lead  to  possible  early  LH2  depletion  and  SS ME  shutdown.  There  is  a 
possibility  of  toe  less  of  creu/vehicle  if  trie  i;ne  ruptures  ana  the  aft  bay 
compartment  is  cverpressur i cep , Failure  of  the  disconnect  to  close  during  ET 
separation  w:ii  result  tn  contamination  of  Crbiter  pressur i zat ion  line  during 
reentry . 

O.i  f f user 

A cylindrical  diffuser  is  located  internal  to  the  LH2  tank  at  the 
pressurization  line  outlet.  The  diffuser  is  equipped  with  a perforated 
cylindrical  core* with  more  than  1500  holes.  The  external  portion  of  the 
diffuser  is  a mesh  screen.  The  diffuser  reduces  the  entrance  velocity  of  the 
incoming  pressurant  gas  to  provide  uniform  distribution  of  the  gases  in  the 

ullage.  A pressure  reduction  orifice  is  located  at  the  inlet  to  the  diffuser  ta 

maintain  back  pressure  on  the  pressurization  line.  {Ref.  8,  p.  P-105 

Vent/Reiief  Valve 

In  the  event  of  excessive  tank  pressure,  the  valve  will  relieve  to  protect 

the  tank  structure.  The  LH2  relief  and  reseat  pressures  are  36  +-  1 psig  and  3d 

psig  (minimum  5.  The  LH2  valve  vents  through  the  ET/ground  carrier  umbilical 
prior  to  launch.  (Ref.  17,  p.  2.3-1) 

For  valve  relief  mode  operations,  the  twn  1 ve  design  utilizes  a 

primary  sensing  pilot  and  a secondary  slave  pilot.  The  primary  pilot  uses  local 
ambient  pressure  as  a reference  pressure  (sensed  at  ambient  pressure  sense 
port).  The  primary  pilot  provides  control  so  that  valve  relief  will  occur.  The 
secondary  pilot  allows  flow  to  tha  mam  piston  in  response  to  a signal  from  the 
primary  pilot  during  relief  operation.  The  primary  and  secondary  pilot  inlets 
are  connected  to  the  main  valve  inlet  cavity.  (Ref.  17,  p . 2.3-1  ) 

Failure  of  the  vent/relief  valve  to  remain  closed  or  structural  failure  of 
the  valve  assembly  resulting  in  external  leakage  will  cause  a loss  of  ullage 
pressure.  (Ref.  8,  pp.  PA-50  - PA-58) 

The  LH2  vent/relief  valve  position  indicator  switch  tolerance  allows  valve 
to  indicate  closed  when  it  may  be  open  up  to  0.30  inches.  This  condition  could 
allow  undetected  ullage  gas  leakage  during  flight. 

HPOT  Heai-Exchanoer 

See  Appendix  E for  details. 


5.2  SUPPORT  SYSTEMS 

Support  systems  are  those  systems  and  individual  components  which  are 
essential  to  the  operation  of  the  critical  mechanical  component*. 
such  systems  include  tha  pneumatic  and  hydraulic  control.  Electrical 
instrumentation  and  control  (EI&C5  and  electrical  power  < EP ) are  sometimes 
considered  support  systems  in  that  they  act  mainly  to  support  the  function  of 
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the  critical  mechanical  components.  However  , due  to  the  complexity  of  these 
systems  aoq  them  mterreiat  icns  with  systems  outside  the  sShlP,  Ei&C  anc  EF  wilL 
be  addressed  orly  in  a limxtec  sense  in  this  3 x uoy . 

5.1.  I cr»umatic  System 

The  Pneumatic  (Helium)  Pres sar i zat l on  Subsystem  consists  of  helium  Supply 
tanv?  '-egu.1  ator a , check  valves,  control  valves,  end  distribution  lines.  The 
subsystem  supplies  helium  used  within  the  engine  for  purging  the  high-pressure 
oxidizer  turbopump  (HPOT?  intermediate  seal,  for  purging  the  engine  after  shut 
down,  and  for  actuating  the  valves  during  emergency  shutdown.  The  balance  of 
the  helium  is  used  to  actuate  the  pneumatically  operated  propellant  valves 
within  the  Frcpellant  Delivery  Subsystem  and  to  pressurise  the  propellant  lines 
prior  to  re-entry. 

A brief  description  of  each  of  those  functions  is  provided  in  the 
paragraphs  below.  A simplified  schematic  of  the  pneumatic  system  is  illustrated 
in  Figure  5-4. 

Control  of  Pneumat ical Iv  Actuated  Valves 

Helium  pressure  is  used  to  close  the  ET/Orbiter  disconnect  valves  and 
propellant  prevalves.  These  valves  are  closed  only  once  during  launch. 
Disconnect  valves  are  closed  a feu  seconds  prior  to  ET  separation  following 
MECG. 

Under  normal  flight  conditions,  prevalves  are  opened  at  T-10  seconds  t to 
allow  propellant  flow  to  the  main  enQines).  The  valves  remain  open  until  MECO 
conditions  are  met.  The  valves  then  are  actuated  to  a fully  closed  position. 

If  emergency  shutdown  of  an  engine  is  required,  prevalves  will  be  sequentially 
closed  following  SSME  valve  closure. 

Failure  of  the  pneumatic  system  will  cause  both  disconnect  and  prevalves  to 
remain  in  the  open  position. 

The  helium  system  also  provides  supply  pressure  to  pogo  system  valves 
described  in  Section  5.3.3*  Loss  of  valve  control  and  post  MECO  charging  is 
considered  to  be  a catastrophic  event. 

Control  of  HvdrauliqaMY  Actuated  Ualves 

SSME  valves  are  regulated  by  the  engine  controller  using  hydraulic  supply 
pressure.  A sudden  drop  in  hydraulic  pressure  will  cause  pneumatic  backup  to 
the  valve  actuators  to  be  initiated.  A pneumatic  shutdown  of  the  engines  occurs 
when  a hydraulic  lock  condition  has  occurred  and  engine  isolation  is  required. 

A description  of  this  process  is  provided  in  Sections  5.2.2  and  5.4.3. 

HPOTP  Intermediate  $e$l  Purq<* 

The  high  pressure  oxidizer  turbopump  is  powered  by  the  ox ; direr  prebumcr. 
Combustion  of  L02  and  LH2  in  the  preburner  creates  a hydrogen  rich  mixture  at 
the  harsh  temperature  and  pressure  of  1405  deg.  R and  5180  PSIA,  respectively. 
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The  r.FGT  pump s LOt  at  an  cutlet  temperature  and  pressure  of  1 3d  deg.  R and  4621 
respectively.  If  the  not  gas  end  cold  o.'.idicsr  meet  an  immediate 
t'.clcsion  may  occur.  (Ref.  !7,  p.  i . i 0-3  ) 

'lining  of  o-.idizer  and  turbine  gas  is  prevented  by  a dynamic  shaft  seai 
caoage  tna:  is  between  the  mam  pump  and  tne  turome.  The  seal  package 
consists  of  a 1 sbvri nth- type  primary  o.idizer  seal,  a purged  controlled-gap 
intermediate  seal,  and  two  controlled  gap  turbine  hot-gas  seals.  Cram  cavities 
uith  overboard  drain  lines  are  located  between  the  primary  oxidizer  seal  and  the 
intermediate  seal,  between  the  intermediate  seal  and  the  secondary  turbine  seal, 
are  aetween  the  secondary  and  primary  turbine  seals.  To  further  insure  against 
the  mixing  of  oxidizer  and  turbine  gas,  a helium  purge  is  aoalied  between  the 
elements  of  the  intermediate  seal  during  engine  operation.  (Ref.  21,  p.  1-22) 

During  ground  operations,  the  intermediate  seal  cavity  is  purged  with 
nitrogen  to  remove  any  residual  air  or  moisture  and  to  inert  the  system.  This 
purge  medium  changes  to  helium  immediately  proceding  engine  start.  Start  is 
inhibited  by  inability  to  verify  adequate  purge  pressure  during  propellent 
conditioning.  (Ref.  11 , p.  2-31) 

The  limit  control  system  initiates  shutdown  for  loss  of  intermediate  seal 
purge  pressure,  excessive  secondary  seal  cavity  pressure  or  primary  LOZ  drain 
pressure  or  HPOT  turbine  discharge  temperature  exceeding  high  or  low  limits. 

<Ref . M , p.  2-2B) 

If  redline  limits  are  being  violated  and  auto-ahuidown  nas  been  innibited 
due  to  an  engine  loss  or  earlier  crew  decision,  immediate  crew  action  is 
required  to  shutdown  that  engine.  It  is  possible  that  complete  engine  failure 
will  occur  so  quickly  that  neither  the  crew  nor  the  ground  will  have  time  to 
react  (Ref.  17,  p.  I. 10-3).  Loss  of  the  HPOT  intermediate  seal  purge  during 
engine  shutdown  could  potentially  cause  mixing  of  LQ2  and  turbine  gases 
resulting  m possible  engine  damage.  However,  loss  of  the  engine  shutdown  purge 
would  occur  only  if  the  SSME  LIMIT  CONTROL  ENABLE/ INHIBIT  switch  was  in  the 
inhibit  position. 

R conplete  failure  of  the  secondary  seal  may  not  result  in  an  engine  loss 
with  the  limits  inhibited,  since  the  hot  gas  still  must  pass  the  primary  seal 
and  the  intermediate  seel  to  get  to  the  L02  in  the  pump.  Thus,  if  the  engine  is 
running  with  limits  inhibited  and  the  secondary  seal  redline  is  violated,  but 
the  intermediate  seal  redline  is  not  violated,  the  engine  has  a high  likelihood 
of  running  until  a safe  abort  region  is  reached  (Ref  17,  p.  1.10-3). 

5.2.2  Hydraulic  System 

The  hydraulic  system  is  included  in  the  model  only  to  the  extent  to  which 
it  services  the  flow  control  valves.  Only  the  piping  and  other  mechanical 
pressure  boundary  components  within  the  SSME’s  have  been  reviewed.  The 
hydraulic  system  also  services  other  portions  of  the  shuttle  which  are  not 
related  to  the  operation  of  flow  control  valves  in  the  mam  engine.  These 
functions  are  not  included  In  the  fault  tree. 
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5.3  PRESSURE  CONTROL 
5.3. i Pressure  Sensing 


It  i5  essential  that  proper jui lage  pressure  be  maintained*  Insufficient 
ret  positive  suoticn  pressure  CNPSP)  rev  cause  engine  pump  cavitation  and 
sub 5 eq uen  t e.-.p  los  ion . 


LOZ  Tank 


LOZ  tank  pressure  is  maintained  by  the  tank  ullage  pressure  transducer 
control  circuit  providing  discrete  pressure/ f 1 ow  control  valve  open  (full  flow) 
and  closed  (reduced  flow)  signals  in  accordance  with  the  sensed  tank  pressure. 
The  external  tank  contains  four  ullage  pressure  transducers  with  three  of  the 
four  transducers  used  for  controlling  the  operation  of  the  flow  control  valves* 
Each  transducer  is  dedicated  electronical Ly  assigned  to  an  engine  and  provides 
direct  control  for  that  engine’s  flow  control  valve.  The  fourth  transducer  is 
switched  into  the  control  circuit  should  a ullage  pressure  transducer  failure 
occur  pncr  to  launch.  (Ref.  7,  p.  B-J) 

The  LG2  tank  has  gauge  type  transducers*  The  gauge  transducers  have 
individual  ambient  sense  ports  that  can  fail  due  to  plugging  from  contamination 
or  icing  and  could  result  in  a transducer  reading  low,  (Ref,  8,  p.  E-!0> 

Failure  of  one  sensor  reading  lower  than  actual  tank  pressure  will  open  the 
corresponding  ECU  early.  Tank  pressure  will  remain  within  nominal  limits  with 
one  failed  sensor.  If  two  or  three  sensors  read  lower  then  actual  pressures  ana 
the  vent/relief  valve  fails  closed,  tank  overpressurtzat ion  will  result.  Relief 
vaLve  operation  could  cause  loss  of  usable  propellant.  (Ref,  8f  p.  E-A-41  ) 

Two  or  three  sensors  reading  higher  than  actual  pressure  will  cause  flow 
control  valves  to  shut  off  too  soon  causing  tank  underpressurization*  (Ref,  8, 
p.  EA- 9) 

LH2  Tank 

LH2  tank  pressure  is  maintained  by  the  tank  ullage  pressure  transducer 
control  circuit  providing  discrete  pressure/f low  control  valve  open  (full  flow) 
and  closed  (reduced  flow)  signals  in  accordance  with  the  sensed  tank  pressure. 
The  crew  is  provided  with  an  override  switch  which  provides  backup  for  the 
condition  of  two  failures  in  the  tank  ullage  pressure  tranducers/f low  control 
valve  circuits  to  provide  adequate  pressurant  to  the  LH2  tank.  The  switch 
by-passes  the  control  for  the  pressure/f low  control  valves  and  commands  all  of 
the  pressure  flow  control  valves  to  the  normally  open  (full  flow)  position.  The 
switch  would  be  operated  if  the  C & U gave  an  indication  of  lowering  mainfold 
pressure.  The  external  tank  contains  four  ullage  pressure  transducers  with 
three  of  the  four  transducers  used  for  controlling  the  operation  of  the  flow 
control  valves.  Each  transducer  is  dedicated  electronically  assigned  to  an 
engine  and  provides  direct  control  for  that  engine’s  flou  control  valve.  The 
fourth  transducer  is  switched  into  the  control  circuit  should  a ullage  pressure 
transducer  failure  occur  prior  to  launch,  (Ref.  7,  p.  S-l) 

Failure  of  one  sensor  reading  lower  than  actual  tank  pressure  will  open  the 
corresponding  FCV  early,  Tank  pressure  will  remain  within  nominal  limits  with 
one  failed  sensor.  If  two  or  three  sensors  read  lower  than  actual  pressures, 
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rsiisr'  valve  actuation  nay  be  required  to  prevent  overpressurization.  belief 
v*t,e  coeratic-n  could  cause  fire  dartaQe  to  the  (TPS)  thermal  protection  system 
ar.c  less  of  usable  prooellant.  (Ref.  3,  p.  E-A-4|> 

■ J-  sf  tnree  sensors  reading  Higher  than  actual  pressure  uill  cause  flou 
•elves  *o  abut  or  f too  soon  causing  tank  underpressumat  ion . 'Ref 
3 ,p . En-3  } 


S.3.2.  Pressure  Relief 

The  propellant  tank  vent/relief  valve  assemblies  provides  two  functions, 
vent  and  relief.  The  vent  function  is  only  utilised  during  propellant  loading 
launch  countdown  and  hold  periods,  and  the  relief  function  is  used  to  protect 
ins  tank  structure  when  the  vent  is  closed  by  automat icai ly  reducing  the  ullage 
pressure  in  the  event  that  it  exceeds  a preset  value. 

During  flight  the  vent  valves  assure  their  normally  closed  position  and  act 
as  safety  relief  valves  to  protect  against  overpressurisation.  Failure  of  LOZ 
or  L^-  relief  functions  would  result  in  tank  overpressur izat i on ' i f a secondary 
system  failure  tie.  flow  control  valve  failure)  exists.  (Ref.  8,  P-A-15,  S2 ) 

5.4  PCGO  SUPPRESSION 

Pogo  is  self-induced  longitudinal  oscillation  involving  major  vehicle 
coroonent s , structure,  f eed 1 i ne 5 , tupbopu^ps p end  entire.  Pc^c  rccuitc  in 
undesirable  low  frequency  oscillations  (typically  5 to  2SHz ) with  potentially 
detrimental  effects  on  the  vehicle  crew's  ability  to  function  and  on  vehicle 
structure  and  components.  (Ref.  M,  p.  2-127) 

Loss  of  Pogo  suppression  capability  from  one  engine  after  liftoff  is 
considered  to  result  in  structural  oscillations  and  feedline  pressure  of 
unpredictable  amplitude  which  can  iead  to  loss  of  crew/vehicle.  (Ref.  23, 
p . 1 -7  ) 

A 602  Pogo  suppressor  system  is  incorporated  into  the  L02  feed  system  at 
the  HPOT  inlet.  The  system  utilizes  a gas  filled  accumulator  to  suppress 
vehicle-induced  flow  oscillations.  602  tapped  off  the  heat  exchange  is  used  as 
the  pressurization  medium  following  an  initial  helium  precharge.  The  system 
controls  liquid  level  In  the  accumulator  by  means  of  an  overflow  line  which 
routes  overflow  fluids  through  the  rec irculat ion  isolation  valve  (RIU)  and  the 
LOZ  bleed  line  to  the  manifold  feedline  upstream  of  the  prevalves.  (Ref,  9, 
p.  1-7)  Refer  to  Figure  5-5. 

The  Pogo  suppression  system  consists  of  a flanged  accumulator,  standpipe, 
helium  precharge  valve  package.  E02  control  valve,  a recirculat ion  isolation 
valve,  and  two  recirculation  control  valves.  The  engine  controller  controls  the 
valves.  (Ref.  9,  p.  1-7) 

The  accumulator  is  chilled  by  L02  during  »nnip#  r hi i t down  operation.  F— at 
convection  circulation  within  the  accumulator,  with  optional  cycling  of  the 
recirculation  isolation  valve,  allows  the  accumulator  to  fili  to  the 
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recirculation  line  level.  This  level  is  sufficient  to  preclude 
start.  '.Ref.  3,  p.  2-128 > At  termination  of  engine  chilldoun, 
tns  Pogo  r»c i rcu 1 a t ian  control  valves  are  opened.  <R*f.  !?,  p. 

During  engine  star*  § charging  of  the  accumulator  with  helium  is  delayed  by 
2.4-  seconds  after  the  engine  start  signal  to  permit  the  engine  to  reach  a well 
behaved  portion  of  its  pressure/ f low  transient.  At  that  point,  the  controller 
signals  helium  flow  through  the  helium  precharge  valve.  Helium  entering  the 
accumulator  forces  the  102  level  down  to  the  nominal  operating  position  in 
appro* irately  2.0  seconds.  (Ref.  il , p.  2.2-8) 

The  helium  precharge  is  utilised  in  the  €02  Pogo  suppressor  system  to 
provide  a rapid  charge  and  thereby  afford  Pogo  protection  during  liftoff  and  the 
early  part  of  the  flight  until  gaseous  oxygen  is  available  from  the  engine  heat 
exchanger.  The  helium  prechrarge  valve  (HPV)  is  also  used  to  provide  helium  to 
the  accumulator  as  a post  charge  at  engine  shutdown.  The  HPV  contains  a 
15-micron  filter  at  the  helium  inlet.  (Ref.  I?,  p.  1.6-8) 

The  helium  precharge  solenoid  valve  also  controls  the  normally  open  G02 
control  valve.  When  the  solenoid  is  de-energized,  G02  is  supplied  to  the 
accumulator.  (Ref.  11  , p.  2.2-8) 

The  G02  Control  Valve  (GCV)  provides  gaseous  oxygen  pressurant  to  the  Pogo 
accumulator  during  engine  operation  after  the  engine  heat  exchanger  is 
functioning.  A bleed  orifice  provides  fail-safe  valve  actuation  to  the  open 
position.  Pneumatic  pressure  to  the  dosing  side  or  the  actuator  is  also 
applied  to  the  opening  side  of  the  bleed  orifice.  This  will  cause  the  valve  to 
reopen  approximately  2 seconds  after  the  application  of  closing  pressure. 

(Ref.  17  , p.  I .6-9  ) 

The  normally  open  Recirculation  Isolation  Valve  (RIV)  is  actuated  closed  by 
the  sane  pneumatic  pressure  that  opens  the  normally  closed  Oxidizer  Bleed 
Valve.  RIV  opening  during  engine  operation  is  ensured  by  routing  gaseous 
oxygen  from  the  override  port  of  the  GCV  to  the  opening  side  of  the  RIV  actuator 
when  the  6CV  is  opened.  (Ref.  17,  p.  1.S-G) 


gas  ingest  ion  at 
T- i 2 . S second;  , 
2.2-7) 


S.5  ELECTRICAL  INSTRUMENTATION  ANO  CONTROL  (EI&U  FUNCTIONS 

The  SSNP  EISC  system  can  be  generally  classified  as  performing  one  of  the 
following  critical  functionsi 


1 ) Propellant  flow  rate/mixture  regulation 

2)  Engine  shutdown  on  demand,  and 

3)  Thrust  vector  adjustment  (gimbaling,  throttle-up  etc.) 

4)  External  fuel  tank  separation  actuation. 


A brief  description  of  each  of  these  systems  operations  and  associated 
hardware  is  provided  in  the  sections  below.  Functions  provided  hy  the  on-board 
general  purpose  computer  (6PC)  will  only  be  desenoed  masfar  as  system 
interfaces  are  concerned.  . Analysis  of  the  GPC  and  EliC  functions  (Avionics 
System)  not  strictly  supporting  S5MP  operation  is  outside  the  scope  of  this 
effort.  The  boundaries  of  EI&C  functions  included  m the  analysis  are  snown- in 
Figure  5-6. 
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5.5.  ! ^vi3r,::s  System  Features  and  Interfaces 

reaturas  associates  with  the  Avionics  System  include  the  #cl  1 ow : nr : 

t.  Tre  fi"i»ary  flight  system  (PF$>  design  is  based  on  a centralized 
set  of  Quad-redundant  general -ouroose  computers  <SPC’s)  uithin 
the  aata  process.!*,.;  syit&m  'OPS)  uhicn  provices  the  primary  rode 
of  accuinng  f light -critical  sensor  cata,  processing  the  data, 
and,  finally,  generating  and  delivering  guidance,  navigation,  and 
control  < SMC ) commands  to  the  various  vehicle  control  elements. 


2.  Additionally,  a single  GPC  uith  independently  designed  and  coded 
flight  software  called  the  backup  flight  system  (8FS),  is  available 
to  take  over  vehicle  control  through  the  primary  bus  structure  from 
the  PF5 , if  necessary. 

3.  The  OPS  bus  structure  contains  24  separate  serial  digital  input/ 
output  <1/0  buses  including  eight  f light-critical  ( SMC ) and  five 
int arcomputer  < ICC)  buses,  which  provide  for  sensitive  data 
communications  and  control  through  the  GPC  redundant  set.  The 
three  engine  interface  units  and  two  master  events  controllers  are 
cross-strapped  to  the  four  Flight  Critical  (FC)  buses  and  provide 
interface  services  between  the  GPCs  and  the  Mam  Engine  Controllers 
and  associated  events  sequencing  functions. 

4.  The  various  multiply  redundant  inertial  navigation  and  flight  control 
sensors  and  effectors  must  be  in  a constant  state  of  readiness  to 
perform  the  fault  detection,  isolation,  and  reconfiguration  (FOIR) 

f unct  ions . 

5.  The  avionics  and  nonavionics  system  management  (SM)  function  is 
performed  in  conjunction  with  the  operational  instrumentation  (01). 

S.  fit  three-string  electrical  power  distribution  and  control  system 

provides  single  fault-tolerant  power  to  non-f 1 ight-crit ical  systems 
and  dual  fault-tolerant  power  to  flight-critical  systems. 

5.5.2  Propellant  Flow  Rate/Mixiure  Control 

The  SSME’s  can  be  throttled  over  a range  of  SS  to  109  percent  of  rated 
power  level  in  l-percent  increments.  At  sea  level,  the  engine  throttle  range  is 
restricted  by  nozzle  flow  separation.  The  65-percent  throttle  setting  is 
referred  to  as  minimum  power  level  (MPL). 

All  three  engines  normally  receive  the  same  throttle  command 
simultaneously.  The  command  comes  from  the  GPC’s  to  the  MEC’s.  During  certain 
contingency  operations,  manual  crew  control  of  engines  is  possible  by  use  of  the 
speed  brake/MEC  handle.  Throttling  reduces  vehicle  toads  during  maximt,*1" 
aerodynamic  pressure,  limits  longitudinal  acceleration  to  3 g’s  during  boost, 
and  makes  it  passible  to  abort  with  all  main  engines  thrusting  or  with  one 
engine  out. 
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i ar  : c an  electronics  cac)  age  mounted  on  eacn  zl  ME  and  con^ir? 
u : sr  = jjitr,  issociitac  electronics  that  centre:  all  nein  engine 
cerai  :cns.  The  control isr  is  attached  i o tne  main  comous t ion 
-n:un t Fittings. 


Ee.cn  : c r : r : ! : er  operates  ;n  conjunction  with  engine  sensors,  valves, 
actuators,  cno  itarr  ignitors  to  provide  a se  1 f -corit a 1 ned  a/stan  for  engine 
contrcl,  checkout,  arc  norutoring.  The  controller  provides  engine  fiigrt 
-eeai ness  v e*“ x f : c a t : on  , engine  srart  and  shutdown  seguenc  mg  f c losed-iocs  thru*: 
ana  orooeilsnt  mature  ratio  control,  sensor  excitation,  valve  actuator  and 
= oari  ,gn:tcr  control  signals,  engine  per f crmarce  I mi t monitoring,  onboard 
engine  checkout  and  response  to  vehicle  commands  and  transmission  of  engine 
status,  and  performance  and  maintenance  data. 


Eacn  engine  controller  receives  engine  commands  transmitted  by  the  orbiter 
GPC's  through  an  engine  interface  unit  C E I U > dedicated  to  that  engine 
controller.  Tne  engine  controller  provides  its  own  commands  to  the  main  engine 
components.  Engine  data  are  sent  to  the  engine  controller,  where  the  data  are 
stored  in  a vehicle  data  table  <\}Q1)  in  the  controller’s  computer  memory. 
Controller  status  compiled  by  the  engine  controller’s  computer  are  also  added  to 
the  vehicle  data  table.  The  vehicle  data  table  is  periodically  output  by  the 
controller  to  the  EIU  far  transmission  to  the  orbiter  GPC’s. 


The  El  Li  is  a specialized  mul  t ip  lexer/demult  iplexer  ( ftOM  > that  interfaces 
with  the  orbiter  GPC’s  and  with  the  engine  controller.  Uhen  engine  commands  are 
received  by  the  EIU,  the  data  are  held  in  a buffer  until  the  EI'J  receives  an 
orbiter  SFC  request  for  data.  The  EIU  then  senda  data  tc  each  orbiter  EPC. 

Each  EIU  is  dedicated  to  one  SSME  and  communicates  only  with  the  engine 
controller  that  controls  the  S3ME.  The  EIU’s  have  no  interface  with  eacn  other. 


The  controller  provides  responsive  control  of  engine  thrust  and  mixture 
ratio  througnout  the  digital  computer  in  the  controller,  updating  the 
instructions  to  the  engine  control  elements  50  times  per  second  < evey  20 
milliseconds).  Engine  reliability  is  enhanced  by  a dual  redundant  system  that 
allows  normal  operation  after  the  first  failure  and  a fail-safe  shutdown  after  a 
second  failure.  Nigh-rel lab i 1 i ty  electronic  parts  ere  used  throughout  the 
control ler. 

The  digital  computer  is  programmable , allowing  modification  of  engine 
control  equations  and  constants  by  change  to  the  stored  program  (software).  The 
controller  is  packaged  in  a sealed,  pressurized  chassis  and  is  cooled  by 
convection  heat  transfer  through  pm  fins  as  part  of  the  main  chassis.  The 
electronics  are  distributed  on  functional  modules  with  special  thermal  and 
vibration  protection. 

The  controller  is  divided  into  five  subsystems:  Input  electronics,  output 

electronics,  computer  interface  electronics,  digital  computer,  and  power  supply 
electronics.  Each  subsystem  is  duplicated  to  provide  duaily  recundant 
capability,  ft  simplified  redundancy  diagram  of  the  controller  is  Figure  5-7. 

The  input  electronics  receive  data  from  all  engine  sensors,  condition  the 
signals,  and  convert  to  digital  values  for  processing  by  the  digital  computer. 
Engine  control  sensors  are  dual -redundant  , and  maintenance  data  sensors  are 
non-redundant . 
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The  output  elsctrorucs  convert  t^e  ccnputsr  digital 
^cltaqes  3 j i taa 1 e r or  powering  the  engine  spart  igniters 
the  engine  prccsilent  vaive  actuators. 


control  camnancs  ;n:c 
t n s off  / o n v a i j e = 5 n c 


The  ccnauter  interface  aietronics  control  the  flow  of  da^a  within  tne 
controller,  input  data  to  the  computer,  and  computer  output  commands  to  tne 
output  elec-tror-i.es.  : hey  also  provide  the  controller  interface  uitn  the  vehicle 
engine  electronics  interface  unit  for  receiving  engine  commands  whicn'are 
triple-redundant  channels  from  the  vehicle  and  transmi s$ ion  o f engine  status  anc 
data  through  dual-redundant  channels  to  the  vehicle.  The  computer  interface 
electronics  includes  the  watchdog  timers  that  determine  which  channel  of  the 
dual  redundant  mechanization  is  in  control. 


During  prelaunch,  the  orbiter  SPC ' s will  look  at  both  primary  and  secondary 
data.  Loss  of  either  primary  or  secondary  data  will  result  in  data  path  failure 
and  either  engine  ignition  inhibit  or  launch  pad  shutdown  of  ail  three  SSME’s. 


At  T-0  „ the  orbiter  6PC 1 a wiLl  request  both  primary  and  secondary  data  from 

"?  lf‘lures-  cnly  Pri**ry  data  are  looked  at.  If  there  is  a loss 
cf  p.  imary  data  iwhich  can  occur  between  the  engine  controller  Channel  h 

electronics  and  SSME  SOP),  then  the  secondary  data  are  examined. 


S.5.3  Engine  Isolation  on  Demand  (Shutdown) 


Engine  shutdown  is  initiated  when  pre-established  porsmetric  conditions 
(redlines)  are  met  and  processed  through  the  controller.  Propellant  flow  to  the 
engines  is  then  cut  off  by  means  of  the  engine  flow  valves  and  the  orbiter 
prevalves  on  the  L02  and  LH2  system  lines.  This  effectively  isolates  the 
external  tanks  and  orbiter  plumbing  from  the  engines. 

The  controller  may  fail  to  generate  a shutdown  command  if  1 > the  engine 
interface  unit  <£IU>  or  general  purpose  computer  < SPC ) fails  to  send  the  proper 
signals,  2)  the  electric  power  is  lost  to  the  valves/controller , and  3)  if  the 
pre-est ab l i shed  redlines  are  violated. 


The  shutdown  sequence  is  initiated  when  minimum  power  level  (MPL)  is 
detected,  MPL  is  currently  set  to  9®X  of  full  power.  Engines  may  also  be  shut 
down  if  high  temperature  is  detected  on  either  high  pressure  pump  turbine 
exhaust.  Other  shutdown  parameters  include  low  nam  burner/  chamber  pressures 
and  low  tank  level. 


More  discussion  regarding  system  response  to  shutdown  of  one  or  more 
engines  is  provided  In  Section  S. 

£•5.4  External  Tank  Separation 

External  tank  detachment  from  the  orbiter  is  controlled  by  the  GPC. 
Activation  of  ET/orbiter  pyrotechnics  occurs  after  isolation  cf  disconnect 
flapper  valves  and  P1ECQ  enables  the  firing 
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impact  m a Per  mad  I arc : ng  irsa  a^rf  avoiding 
:=  designed  to  lunaie  shortly  after  Orbiter/ET 


The  tumble  system  is  initiated  just  orior  to  separation  by  commands  f>cm 
tna  Orb  iter.  Tne  turn  i e valve  arm  contra  rid  i.-u  tiated  S seconds  after  F-^ECO  is 
confirmed  by  the  3FC  ' s arid  the  valve  fire  command  15  initiated  1 second  later, 
ifief . 7 . p . 5-1 9 > 

The  first  arbiter  command  arms  the  circuit  by  energizing  the  switcn  module 
relay.  This  closes  the  two  normally  open  relay  contacts  which  completes  the 
firing  circuit  tc  the  NASA  Standard  Detonator  (NSO>.  The  second  command  is  the 
* i re  command  which  activates  the  NSO  and  fires  the  pyre  cartridge  that  opens  a 
two-inch  valve  mounted  on  the  ogive  forward  ring  forging.  The  residual  6C2  is 
vented  in  the  +2  a/is  providing  tne  reouired  tumble  thrust. 

S.S  MAIN  ENGINE  SHUTDOWN 

The  two  failure  situations  discussed  m this  section,  one  and  two  engine 
shutdowns,  are  analyzed  because  each  is  related  to  the  MPPS.  However,  they  are 
grouped  together  for  this  report  because  they  appear  to  represent  a cross 
section  of  mission  techniques.  Failure  to  shut  down  an  engine  illustrates 
redundancy  and  two-engine  shutdown  poses  a real-time  mission  decision. 

5.6.1  Failure  to  Shutdown  a Single  Engine 

Engine  shutdown,  whether  initiated  by  the  controller  or  by  crew  command,  is 
a safe  response  to  an  unsafe  operating  condition.  Serious  consequences  may 
result  if  a mam  engine  fails  to  shut  down  on  demand.  To  overcome  such  a 
failure,  several  shutdown  methods  have  been  designed  into  engine  operation. 

Emergency  engine  shutdown  may  be  initiated  from  any  steady  state  or 
transition  thrust  level,  including  engine  start.  The  engine  shutdown  sequence 
is  initiated  by  the  controller  upon  receipt  of  a vehicle  shutdown  command  or 
engine  parameters  which  exceed  predeterm ined  rediine  limits.  If  the  controller 
cannot  accomplish  shutdown  via  hydraulic  actuators,  it  will  perform  shutdown 
with  helium  pneumatic  pressure  via  the  Pneumatic  Control  Assembly  (PCA).  If 
malfunctions  are  such  that  the  engine  is  still  operating,  the  crew  can  take 
action,  first,  by  cutting  off  electrical  power  to  the  engine  and,  finally,  by 
closing  the  prevalves  to  stop  the  fuel  flow. 

5.5.2  Two  Engine  Shutdowns 

Other  considerations  aside,  the  three  engines  on  the  Shuttle  represent 
redundancy.  However,  this  is  true  only  for  a smgie-eng tne-out  situation;  i.e., 
the  Shuttle  can  safely  return  to  the  launch  site  or  perform  one  of  the  other 
preplanned  aborts  on  two  engines.  If  two  engines  shut  down,  or  the  second 
engine  must  be  shut  down  by  the  crew,  a safe  abort  is  possible  only  of  the 
Qrbiter  has  achieved  the  velocity  threshold  that  would  allow  at  least  a TAL 
abort  on  the  one  remaining  engine.  Thus,  should  a second  engine  drift  cut  of 
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personnel  and  the  crew  nay  have  a »ery  difficult  dec 


engine  shuts  down  and 
, mission  operations 
is  ion  to  nake. 


the 


Consider  a scenario  beginning  with  the  shutdown  or  a single  engine  soon 
^ ^ • ®r”  -if  of . . vJhen  this  event  occurs,  the  Oroit  er  c cno  uters  send  a command  to 
inhibit  shutdown  of  either  of  the  other  two  engines. 

To  enable  a second  automatic  shutdown,  the  inhibit  of  ore  or  both  of  the 
remaining  engines  must  fail.  This  is  a credible  situation  if  a communication 
path  failure  between  the  Orbiter  GPC’s  and  an  engine  had  occurred.  For  the  same 
-eason  , the  crew  would  also  be  unable  to  inhibit  engine  shutdown.  Such  a 
situation  exposes  the  flight  to  the  possibility  of  a second  automatic  shutdown. 

Presuming  the  remaining  two  engines  have  been  inhibited  from  an  automatic 
shutdown  . ground  operations  personnel  and  the  crew  will  monitor  engine 
parameters  to  identify  off-nominal  operations  of  either  of  the  two  remaining 
engines . The  less  of  intermediate  seai  purge  is  a singularly  serious 
malfunction.  If  this  happens,  the  crew  may  have  to  shut  down  the  second  engine, 
even  at  the  risk  of  a dangerous  abort.  Another  conceivable,  though 
hypothetical,  situation  is  the  shutdown  of  a second  engine  with  a contained 
fai lure. 


Thus,  the  two-engines-out  situation  can  result  in  a loss  of  life  or  vehicle 
if  the  event  occurs  between  lift-off  and  single-engine  TAL. 


S.7  GROUND  OPERATIONS  IMPS) 

Flight  preparations  between  T-8  hours  and  the  end  of  ET  pressurization 
consist  primarily  of  four  major  functions: 

1 . System  purge 

2.  System  chllidown 

3.  Propellant  fill 

4.  ET  Pressurization 

Apoendix  F describes  the  individual  tasks  and  readings  needed  to 
successfully  eceomplish  these  operations.  Propellant  fill  is  the  process  by 
which  the  external  tank  <L02  and  LH2 ) is  slow  and  quick-filled  to  100X  level. 
Chilldown  is  the  process  by  which  system  piping  and  components  are  cooled  by  the 
propellants  in  order  to  minimize  thermal  shock.  Failure  to  properly  chill 
propellant  system  pressure  boundary  may  result  in  gross  leakage  or  pipinQ 
rupture  due  to  overstress  conditions.  System  purge  and  anti-icing  is  performed 
to  prevent  the  accumulation  of  contamination  (mainly  water)  from  plugging  lines 
upon  the  introduction  of  propellents. 

Pneumatic  System  SSE  used  in  the  MPP5  consists  of  three  independent 
subsystems.  Two  of  the  subsystems  are  identical.  The  L02  and  LH2  ET 
prepressurizaton  syubsystems  provide  gaseous  helium  to  their  respective  tank  in 
order  to  pressurize  those  tanks  sufficiently  to  permit  SSME  start.  The  GSE  then 
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cent  muss  to  or c-vide  he  [ iur  eras  sure  to  the  tarns  until  lift -off.  Each 
c-essur . 05  ton  jus  system  consists  of  three  valves,  >uc  :r  carsiiisi  .»nc  :re  as  a 
* ill  tor  f to  the  other  two.  These  valves  are  controlled  rty  the  Launch  F-ccassmg 
5/  stem  * L°S  ■ , uni  oh  senses  the  ullage  pressure  : n the  ET  end  ooen.s  arc  : loses 
tne  valves  accordingly.  The  helium  is  provided  by  a 1033  psig  facility  scarce. 

The  cnccarc  helium  system  is  pressur :;s j prior  to  flight  o y the  tniro  oSE 
subsystem,  the  orimarv  helium  pres  suncat  ion  reduction  and  bottle  fill  panel. 
This  panel  regulates  helium  aoun  to  4.^50  + E0  osiq  through  two  parallel 
circuits.  The  helium  flow  is  controlled  by  shuteff  vaives , which  ere  opened  ano 
ciosec  by  the  lP5  in  order  tc  pressurire  the  cnooard  helium  tan**  ui *nout 
e.'.cseding  temoerature  limits.  The  he  1 turn  ;s  provided  Oy  E000  psig  facility 
source. 


5.3  MISSION  ACC0MG0ATI0N5  OF  IN-FLISHT  FAILURES 

The  Space  Shuttle  mission  has  been  designed  to. accomodate  operational 
failures  within  the  flight  systems.  During  the  ascant-to-orbi t phase  of  a 
mission,  a series  of  abort  strategies  has  been  devised  to  ensure  a “fail 
operational-fail  safe"  capability.  Simply  stated,  the  strategies  may  be  defined 
as  follows: 

\)  Return  to  launch  site  (RTLSJ  is  the  initial  abort  mode,  which  allows 
the  vehicle  tc  abort  anytime  after  launch  and  to  return  to  the  Kennedy  Space 
Center  (KSC)  runway.  The  constants  are  (a)  the  loss  of  no  mor*  th*n  engine 

and  C b ) sufficient  main  engine  propellant  to  steer  the  Shuttle  on  a return 
course  to  KSC  with  the  desired  position  and  velocity  state  prior  to  engine 
cutoff.  Although  a critical  failure  may  occur  earlier,  this  mode  will  net  be 
activated  until  approximately  2 minutes  30  seconds  after  lift-off  when  the  solid 
rocket  boosters  < SR3s  ) have  burned  out  and  separated  from  the  Shuttle. 

2>  Transoceanic  abort  landing  (TAD  is  the  second  abort  mode  anc  overlaps 
the  RTLS  capability  at  approximately  4 minutes  after  lift-off.  This  mode 
provides  the  capability  for  the  Orbiter  to  land  at  a contingency  site,  generally 
in  North  Africa  or  Spam.  This  option  is  usually  activated  following  a critical 
system  failure  in  order  to  land  as  soon  as  possible.  This  mode  has  full 
capability  to  accommodate  one  failed  engine  and  a limited  capability  to 
accommodate  two  failed  engines  (velocities  approximately  > 13,000  ft/sec). 

3>  Abort  once  around  (AOA)  provides  for  an  abort  landing  at  Edwards  Air 
force  Base  or  at  White  Sands  by  achieving  the  desired  hypersonic  suborbital 
flight  state  at  MECO.  This  mode,  which  provides  engine-cut  accommodation 
similar  to  TAL , is  initiated  because  the  vehicle  flight  energy  state  has 
progressed  past  the  autoguidance  TAL  capability  (velocity  approximately  ; 23,000 
ft/ sec  ) . 

*)  Abort  to  orbit  CATO),  the  final  mode,  is  an  option  from  the  AQA.  If 
sufficient  onDoanj  propulsion  capability  exists  and  the  critical  failure  does 
not  affect  mss  ion  completion,  the  Orbiter  will  be  oraoelled  bv  the  Orbital 
Maneuvering  System  (QMS)  engines  to  the  desired  orbit  after  ET  separation.  The 
mission  may  be  continued  or  aborted,  depending  on  the  criticality  of  the  system 
f ai lure. 
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TABLE  5-1 

Salient  Differences  anc  Points  or  Asymmetry 
Between  102  ana  LH2  Propellant  Systems 


Ccmooreru/Soosystam 

102 

IH2 

External  Tart*  (ET) 

Propellant  tank  on  the  top  portion  of  ET. 
Has  a tumble  voive 

Propellent  tank  on  trie  Dottom 
portion  of  ET.  No  lumole  valve 

Location  of  ET  lew  leva] 
sensing  devices  ( ET 
separation  parameter) 

! 

Inside  arbiter  on  mom  flow  line 
downstream  of  the  ET/orpIter 
disconnect  valves 

OnET 

Propellent  prevalves 

Open/close  circuitry  ana  valve 
pneumatic  actuator  differences 

Open/closa  circuitry  end 
pneumatic  actuator 
differences 

POOO  Suppression  and 
; entigeysering  line 

POOO  accumulator  and  isolation 
valves  end  cntiQeyaertng  Hna 

No  POGC  suppression  nr 
entigeysenng  line 

low  pressure  turbo  pumps 

Driven  Dy  mgn  pressure  turbopumc 
discharge  pressure.  No  recirculation 
pump  system  used. 

Driven  oy  mein  angina  chamoar 
ocolant  pressure 

High  pressure  turoo  pumps 

Regulated  by  102  flow  control  valves 
on  tPOT  pretiurners.  Heat  exchanger 
colls  on  the  pump  preburner 

Regulated  by  L02  flow 
central  valves  on 
preburners.  No  LH2  flow 
control  valves  are  used  on 
the  IP  FT  preburntrs.  No 
heat  exchange. 

Recirculation  pumps 

i 

No  recirculation  pump  system  ts 
used 

Recirculation  How  initiated  for 
propellant  system  downstream  of 
prevalves 

i 

r 
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L02  PROPELLANT  DELIVERY  SYSTEM 
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LH2  PROPELLANT  DELIVERY  SYSTEM 
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L02  TANK  PRESSURIZATION  SCHEMATIC 
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Pogo  suppression  systsm  schematic. 
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FIGURE  5-6 

SIMPLIFIED  FUNCTIONAL  BLOCK  OIAGRAM  OF 
MPPS  ELECTRICAL  CONTROL  SYSTEM  * 


* DASHED  LINE  SIGNIFIES  THE  CONTROL  SYSTEM 
80UNDARY  NOT  ANALYZED  IN  THIS  STUDY. 


Figure  5-7,  SSflE  controller  simplified  redundancy  diagram 
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Section  6 
RISK  ASSESSMENT 


P i 5J-  i - typically  defined  as  the  product  of  the  probability  of  an  event 
occurrence  and  t ns  severity  of  its  consequences,  or 

Risk  - Probability  * Severity  Equation  6-1 

For  this  study,  the  risk  is  a ioss  of  life  and/or  vehicLe  due  to  a 
combination  of  flPPS  related  failures.  The  probability  is  that  of  Less  of  life 
and/or  vehicle  during  a mission.  Severity  is  measured  in  t errs  of  the  number  of 
potential  fatalities  and  hardware  leases  resulting  from  a catastrophic  failure. 

Equation  6-1  13,  however,  a simplistic  represents i cn  of  risk.  The 

severity  of  an  accident  varies  as  a function  of  the  specific  failure  scenario, 
or  conoination.  Consider,  for  example,  a catastrophic  loss  of  engine  thrust 
uithm  the  first  few  seconds  following  lift-off.  The  severity  of  this  failure 
will  vary  depending  on  whether  or  not  the  STS  has  cleared  the  launch  facility. 
Timing  of  the  failure  is  just  one  crucial  factor  in  assessing  the  total 
resulting  damage.  Risk  can  thus  be  redefined  as  the  sun  of  the  risks  created 
during  various  stages  of  the  mission,  or 

Risk  * < F x S > 1 Equation  6-2 

Where  F - The  probability  of  catastrophic  failure  during  a mission 
time  interval  * i ,+ 

3 = The  severity  of  the  catastrophic  failure  during  interval  N i " . 

Throughout  this  section,  emphasis  will  be  placed  cn  conseauence  as  a 
function  of  mission  time. 

6.1  LAUNCH  AND  PRELAUNCH  TIME  SEQUENCE 

An  event  tree  may  be  used  as  a method  of  depicting  the  various  outcomes  Cor 
levels  of  severity)  for  time  dependent  failures.  Figures  S-fa  and  8-1b  show  a 
tine-sequence  event  tree  representing  scenarios  which  would  result  if  a 
catastrophic  accident  occurred  during  the  different  intervals  of  the  mission. 

The  lower  branches  represent  either  a failure  to  successfully  avoid  a 
catastrophic  event  or  a failure  to  accomplish  a critical  recovery  operation 
(e,g.F  abort  landing).  Upper  branches  represent  success. 

Two  seoarate  event  trees  are  developed  to  provide  some  distinction  between 
those  events  which  are  not  recoverable  or  immediately  catastrophic  ti.e.  fire, 
exolosion,  aft  compartment  overpresaur izat ion  and  other  non-recoverab ie  events) 
and  those  events  for  which  an  abort  is  possible.  Leakage  and  rupture  of 
propellant  system  or  other  pressure  boundary  create  situations  in  which  the 
recovery  tine  15  very  compressed.  Figure  S-la,  therefore,  represents  fairly 
straight  forward  outcome  status  in  which  an  immediate  catastrophy  is  expected. 
Figure  6-1b,  however,  includes  many  potential  abort  scenarios.  The  consequences 
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3<-s  similarly  dependent  cn  :ne  time  chase.  'oat  >5 
rer  sonne  l as  sc; : * ! sc  with  t-s  var  ic-ui  scon*.  landing 
““s  socr-  ieousrcs  attempted  '1.3.  PILE  , TAL  , etc.) 


ground  facilities  a re 
sites  will  ae  affected  by 


- Cutsets  generated  cy  CAFTA,  a determination  :s  made 
are  applicaola  to  the 


igaremg 
’re 


which  cutsets  ere  appiicaoia  to  the  various  trarenes  of  t ne  event  tree, 
e.-  pc  sure  tires  ter  each  basic  event  in  the  cutsets  is  then  carefully  reviewed 
ensur,  .ha.  proper  probabilities  ere  assigned  for  each  of  the  time  intervals 
••'e'er  to  nooeraix  C,  Table  C-5  for  time-phased  prebabi  1 1 1 :es  > . Tabi«! 

5-!o  identify  the  pertirant  cutsets  for  each  event  tree  branch.  It  is 
necessary , for  practical  purposes,  to  truncate  cutsets  wnose  probabilities  full 
fceiow  13  -3. 


£- 1 a end 


The  probability  of  an  outcome  state  is  the  product  of  ali  event  tree 
branches  leading  to  that  state.  The  probability  of  each  branch  of  the  tree  is 
obtained  by  selecting  the  appropriate  portions  of  the  master  fault  tree  and 
applying  procab 1 1 1 t 1 es  to  the  basic  events  corresponding  to  the  respective 
exposure  times  <i.e.,  time  intervals).  Abort  scenerios  have  not  been 
probabilistically  quantified  since  these  are  outside  the  scooe  of  analysis,  ."or 
time  intervals  between  T + 30  seconds  and  zero  thrust,  fault  tree  probabilities 
were  adjusted  for  fractional  exposure  times  Ce.g.  subdivisions  of  flight 
phases).  The  fractions  are  shown  along  side  the  fault  tree  mnemonics  on  the 
event  trees. 


6.1.1  Basis  for  Division  of  Time  Intervals 

The  divisions  shown  in  the  event  tree  represent  time  intervals  which  define 
distinct  outcomes.  In  other  words,  by  subdividing  the  time  intervals  further, 
one  would  obtain  the  same  number  of  outcome  states,  but  a greater  number  of 
individual  sequences  dependent  on  the  subdivided  time  interval!  no  significant 
additional  information  regarding  risk  15  obtained  from  such  a subdivision. 

The  time— line  for  our  mission  profile  begins  with  flight  preparation 
operations  at  approximately  T-9  hours.  Major  flight  preparation  operations, 
including  cryogen  fill,  system  purging  and  initial  system  checkout  ara  performed 
mainly  during  the  interval  between  T-8  hours  and  crew  boarding  at  T-2  hours. 

This  interval  is  chosen  as  a convenient  segment  of  time  because  any  catastrophic 
accidents  resulting  during  these  six  hours  primarily  affect  the  ground  support 
eouipment  and  launch  facility.  At  the  time  of  crew  boarding,  the  consequences 
of  a major  accident  would  at  least  potentially  include  loss  of  flight  crew 
life.  Other  consequence  categories  are  identified  in  Table  S-2. 

Time  intervals  during  ignition  and  the  flight  are  similarly  divided  into 
milestone  changes  in  the  accident  outcome.  From  T-10  seconds  until  the  time  STS 
clears  all  ground  facilities,  a catastrophic  accident  may  not  be  limited  solely 
to  the  loss  of  STS  and  crew.  Ground  facilities  may  be  affected  from  scattered 
debris  following  explosion.  The  STS  is  conserve! ively  assured  to  pose  no  threat 
to  the  ground  facilities  and.  non-crew  members  after  30  seconds.  Ail  other 
Flight  operat  ions  and  sequences  are  grouped  into  a single  time  interval  wmen 
extends  until  hECO  and  ET  separation.  It  is  important  to  note  that  unsuccessful 
abort  landing  scenarios  have  risks  associated  not  only  with  the  STS/creu,  but 
potentially  with  personnel,  facilities,  and  other  hardware  at  the  abort  landing 
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Events  which  dv  rot  necessarily  Lean  to  an  lmmeoiate  catastrophic  accident 
* e , c . ^ecove^acie  events)  have  tine  intervals  Subdivided.  This  is  necessary 
:ecdu:*  ?tir * scenarios  resulting  from  system  failures  are  hignly  dependent  ;n 
:re  t c r*  t^a  event.  T ne  three  subdivisions  o*  t re  *ault  tree  tire  irterva: 

T - Zv  aecsnas  to  zero  thru sc  are  as  follows: 

c + 3G  seconcs  to  T - C.S  minutes 

o T ♦ 2.S  minutes  to  T + 4.0  minutes 

o T ♦ 4.0  minutes  to  zero  thrust 

This  corresponds  to  an  P:TL5  , TAL  and  orbital  abort  r respectively. 


5.1,2  Consecuancs  Data 

fts  previously  discussed,  consequence  are  measured  as  expected  number  of 
fatalities  and  hardware/ f aci i i t i es  losses  (in  $).  No  attempt  is  made  m this 
analysis  to  combine  these  two  categories  of  losses. 

Some  conservative  assumptions  are  made  regarding  loss  of  human  life 
following  an  accident: 

o Catastrophic  explosions/ fires  on  the  Launch  pad  between  T-B 
hours  and  crew  boarding  at  T-2 . \ hours  are  assumed  to 
cause  only  hardware  damage. 

o Catastrophic  expcsions/ f ires  between  the  time  of  crew 

boarding  and  engine  start  sequence  are  assumed  to  cause 
death  of  crew. 

o Occidents  occuring  between  engine  start  and  T+30  seconds  are 
assumed  to  cause  death  of  crew.  Additionally,  depending  on 
the  time  of  failure,  flying  debris,  explosion  fragments  and 
shock  waves  are  assumed  to  damage  surrounding  buildings  and 
structures  and  potentially  cause  additional  injuri es/deat hs . 

o Major  accidents  after  T+30  seconds  are  assumed  to  affect 

only  the  crew  with  the  exception  of  potential  loss  of  ground 
- personnel  at  abort  landing  sites  if  abort  is  possible. 

Similarly,  hardware  and  facilities  are  affected  according  to  the  interval. 
Any  catastrophic  explosion  prior  to  T+30  seconds  affects  the  STS  plus  the  pad. 
Except  for  abort  landing  scenarios,  only  the  STS  and  ships  in  the  trajectory 
footprint  are  assumed  to  be  affected  once  the  STS  has  cleared  the  launch 
facii ity . 

A summary  of  consequence  data  is  provided  in  Table  S-3.  These  losses  are 
reflected  in  terms  of  a probability  density  function  discussed  later  in  this 
section . 


ORIGINAL  PAGE  f$ 

op  poor  quality 


LMSC-F2230 


6.Z  SJMMPFV  OF  FISK  COMPUTATIONS 
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s-ia  ana  £-!b  are  uses 


compute  each 


•r:-  ^'Jii'dJrcs  categories  are  then  quantifies  according  to  their  risk 
•/2tue  - i i n ; Equation  5-1.  The  severity  is  based  on  the  expected  human  end 
hardware  tosses  specified  in  Taoie  £-3.  The  aggregate  proaabi l i ty  of  each  of 
the  com secuerce  categories  is  shown  s;r  both  human  and  hardware  losses  m Tastes 
t-4a  ana  6-40,  respectively.  It  is  important  to  note  that  the  losses  are 
strictly  represented  by-  those  failures  representec  oy  the  scops  of  tnis  PRA. 
Other  rial,  a not  within  the  scope  defined  by  the  fault  tree  cut  sets  are 
necessarily  not  factored  into  these  results. 

Two  separate  estimates  of  aggregate  probabi 1 lties  are  presented  in  Tables 
5-ia  and  5-ib  in  order  to  distinguish  between  events  in  which  successful  abort 
was  achieved  and  those  events  which  end  in  ultimate  loss  of  1 i f e/vehic 1 e . Mo 
attempt  is  mace  to  quantify  the  likelihood  of  successful  abort  landing  given  a 
disabling  failure.  Therefore  the  specified  total  probability  (for  each  of  the 
consequence  categories)  is  provided  to  show  a range  of  prooabilit les  with  and 
withcut  abort  recovery, 

Abort  landings  can  at  best  be  expected  to  reduce  overall  risk  of 
MPPS-reiated  accidents  by  less  tnan  10X.  The  importance  of  abort  landing 
towards  risk  reduction  varies  depending  on  the  system  involved.  MPPS  failures 
are  seldom  recoverable  ones  and,  therefore,  abort  scenarios  provide  minor 
overall  risk  reduction.  Most  failure  probability  contributions  are  due  to 
non-recoverable  failures  such  as  immediate  explosions  or  aft  compartment 
c verpre s s ur i zat l on  events.  In  total,  non— recoverable  events  are  more  than  one 
order  of  magnitude  higher  than  recoverable  everts,  or  events  in  which  abort 
landing  is  a viable  option. 

Risk  to  human  life,  as  may  be  expected,  is  almost  exclusively  the  result  of 
loss  of  STS  crew.  The  expected  loss  of  life  due  to  MPPS  failure  is  more  than 
one  death  per  hundred  flights.  A residual,  but  insignificant,  risk  is  posed  to 
other  persons  in  the  general  vicinity  of  the  launch  facility  if  a catastrophic 
explosion  occurs  during  the  first  30  seconds  of  flight. 

Risk  to  hardware  consists  primarily  of  the  loss  of  the  STS  vehicle  and 
payload  (note:  payload  loss  is  not  included  in  cost  estimates  of  STS  loss). 

The  average  loss  per  launch  is  estimated  to  be  approximately  *3M.  Facility 
damage  is  the  next  greatest  source  of  monetary  loss  totalling  under  50.  1 N per 
launch.  The  remaining  sequences  contribute  minimally  to  the  total  expected 
losses  . 
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EVENT  TREE  QUANTIFICATION 
USING  FAULT  TREE  CUTSETS 
FOR  EXPLOSION,  OVERPRESSURIZATION  AND 
OTHER  NON- RECOVERABLE  EVENTS 


Event  TreeBrancn 
(fiQure  6- la) 


Applicable  Cutsets 


Cutset  Probability  TOTAL 


CND4Z0IG 


FLGEOSLK 

SPVYPXDP 

VENTPANEL 

nPBVP5LK 

MPBVP ILK 

MPBEOSLK 

CNDEZXIG 

MPBEFSLK 

fiPBVOSLK  . 

MPBYF5LK 

MPBEOPRP 

FLOEOSLK 

SPVCPCDP 


SPYRPCDP 

SPYLPCDP 

BOPEOXRP 

CNDEZXIG 

MPBEJPRP 

ACCRPXDP 

TNKVPODP 

TNKVP6DP 

TNKYP1DP 

TNKVP8DP 

TNKYPEDP 

TNKYP2DP 

TNKVP7DP 

TNKYP9DP 

ACCLPXDP 

TNKVP3DP 

TNKVP4DP 

ACCCPXDP 

MPBVJSLK 

MPBEFPRP 

FLGENSLK 

FIGTNSLK 


OSEUXXST 


CND4Z0IG 

GSEUXX5T 

•* 

CNDEZXIG 

VENTPANEL 

MP8VP3LK 

VENTPANEL 

VENTPANEL 

CNDEZXIG 

FLGEFSLK 

CNDEZXIG 

CNDYZXIG 

CNDVZXIG 

CNDEZXIG 


CNDEZXIG 

BDPEFXRP 


VENTPANEL 

VENTPANEL 

VENTPANEL 

VENTPANEL 

VENTPANEL 

VENTPANEL 

VENTPANEL 

VENTPANEL 

VENTPANEL 

YENTPANEL 


CNDEZXIG 


1.24E- 

8.94E- 

8.32E- 

8.07E- 

8.07E- 

6.57E- 

5.84E- 

4.39E- 

3.98E- 

3.90E- 

3.72E- 

3.65E- 

3.30E- 


3.30E- 
3.30E- 
3.07E- 
3.07E- 
2.99E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.68E- 
2.67E- 
2.52E- 
2. 1 9E- 
2.19E- 


2.02E-04 
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MPBENSLK 

2.19E-06 

MPBYNSLK 

MPBLP3DP 

YENTPANEL 

2.19E-06 
2. 1 IE-06 

MPBRP5DP 

YENTPANEL 

2.07E-06 

MPBCP1DP 

YENTPANEL 

2.07E-06 

PTPBENPRP 

FLOTJSLK 

FLGE08LK 

CNDGRLK 

1.73E-06 

1.46E-06 

1.24E-06 

5CHVP6RP 

YENTPANEL 

1.10E-06 

SCHVP5RP 

YENTPANEL 

1.10E-Q6 

ACCCOMRP 

ACCLOflRP 

ACCROMRP 

SLVCFXOP 

SLYCOXOP 

SLYLFXOP 

5LYRFX0P 

SLYLOXOP 

SLVROXOP 

MPBVOPRP 

CNDYZXIG 

1.05E-06 
1.05E-06 
1.05E-06 
9.19E-07 
9.19E-07 
9.19E-07 
9.19E-07 
9. 19E-0? 
9.I9E-07 
8.19E-07 

MPBYJPRP 

REGRP3CS 

REGVPXHI 

YENTPANEL 

7.74E-07 

7.00E-07 

7.00E-07 

REGRP30P 

REGLP80P 

REGLP2CS 

REGCP1CS 

REGRP90P  j 

REGCP7CS 

REGCPIOP 

REGCP70P 

REGLP8CS 

REGRP9CS 

REGLP20P 

TPSROXRP 

TPSRFLLK 

TPSLOXRP 

TPSLFLLK 

TPSCGXRP 

TPSCFLLK 

I1PBEGSLK 

CNDGRLK 

7.00E-07 

7.00E-07 

7.00E-07 

7,00E-07 

7.00E-07 

7.00E-07 

7.00E-07 

7.00E-07 

7.00E-07 

7.00E-07 

7.00E-07 

6.77E-07 

6.77E-07 

6.77E-07 

6.77E-07 

6.77E-07 

6.77E-07 

6.57E-07 

MPBVNPRP 

FLGEFSLK 

CNDGRLK 

6.42E-07 

5.84E-07 

MPBVFPRP 

CNDYZXIG 

5.75E-07 

MPBEF5LK 

CNDGRLK 

4.39E-07 

MPBTNPRP 

MPBYOSLK 

CNDGRLK 

3.98E-07 

3.98E-07 

MPBYFSLK 

CNDGRLK 

3.90E-07 

PNYRFZCS 

t 

3.78E-0? 
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PNY3NDC3 

3.7CE-07 

PNV30DCS 

3.76E-0? 

PNVLOZCS 

3.76E-07 

PNVLFZCS 

3.76E-07 

PNY3JDC5 

3.76E-07 

PNYROZCS 

3.76E-07 

PNVCFZCS 

3.76E-07 

PNVCOZCS 

3.76E-07 

PNY3FDC5 

3.76E-07 

MPBEOPRP 

cnoorlk. 

3.72E-07 

8DPEFXRP 

CNDGRLK 

3.07E-07 

8DPE0XRP 

CNDGRLK 

3.07E-07 

MPBLP3LK 

YENTPANEL 

2.98E-Q7 

MPBRP5LK 

VENTPANEL 

2.98E-07 

MPBCP1LK 

YENTPANEL 

2.98E-07 

MPBTJPRP 

2.66E-0? 

MPBEFPRP 

CNDGRLK 

2.52E-07 

MPBYP2LK 

YENTPANEL 

2.52E-07 

MPBYP4LK 

VENTPANEL 

2.52E-07 

MPBYP6LK 

VENTPANEL 

2.52E-07 

PRBROSLK 

2.45E-07 

PRBRFSLK 

2.45E-07 

PRBCF5LK 

2.45E-07 

PRBLFSLK  • 

2.45E-07 

PRBLOSLK 

2.45E-07 

PRBCOSLK 

2.45E-07 

MPETNSLK 

2.44E-07 

MPBTJSLK 

2.44E-07 

PAVOTXPA 

2.44E-07 

FILRCPLK 

2.27E-07 

FIILCPLK 

2.27E-07 

FILCPCLK 

2.27E-07 

FLGTOXLK 

CNDGRLK 

2.19E-07 

CNDEZXiG 

WLDEOXLK 

1.76E-07 

F1LLFYRP 

1.74E-07 

F1LRFYRP 

1.74E-07 

FILCFYRP 

1.74E-07 

FILROYRP 

1.74E-07 

FILCOYRP 

1.74E-07 

! 

FILLOYRP 

1.74E-07 

MPBCP2LK. 

VENTPANEL 

1.60E-07 

j 

MP8RP6LK 

VENTPANEL 

1.60E-07 

WLDEJXLK 

1.42E-07 

BL0R0RR6 

1.40E-07 

BLOCOGRG 

1.40E-07 

BLOLOGRG 

1.40E-07 

BLOLORRG 

1.40E-07 

8L0R0GRG 

1.40E-07 

BLXORRO 

1.40E-07 

! 1 
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MPBLP4LK 

YENTPANEl 

I.37E-0T 

TNKVP3DP 

HUMRPXHC 

1.34E-07 

TNKYP 1 DP 

HUMCPXHC 

1.34E-07 

TNKYP9DP 

HUMLPXHC 

1.34E-07 

TNKYP 8DP 

HUMLPXHC 

1.34E-07 

TNKYP2DP 

HUMLPXHC 

1.34E-07 

TNKVP6DP 

HUMCPXHC 

1.34E-07 

TNKYP7DP 

HUMCPXHC 

1.34E-07 

TNKYPEDP 

HUMRPXHC 

I.34E-07 

TNKYPODP 

HUMRPXHC 

1.34E-07 

HGMEZSLK 

1.19E-07 

MPBLP4DP 

YENTPANEL 

9. 19E-08 

MPBCP2DP 

YENTPANEL 

9.19E-08 

MPBRP6DP 

VENTPANEL 

9.19E-08 

MPBVOPRP 

CNDGRLK 

8.19E-08 

WLDENXLK 

8.17E-08 

MPBYFPRP 

CNDGRLK 

5.75E-08 

WLDYP3LK 

VENTPANEL 

5.05E-08 

WLDYP1LK 

VENTPANEL 

4.94E-08 

WLDVP5LK 

VENTPANEL 

4.94E-08 

CNDVZXIG 

WLDVOXLK 

3.88E-08 

MPBRPXLK 

HUMRPXHC 

3.78E-08 

' 

MPBLPXLK 

HUMLPXHC 

3.78E-08 

MPBCPXLK • 

HUMCPXHC 

3.78E-08 

MPBTOPRP 

CNDGRLK 

3.76E-08 

WLDYUXLK 

3.67E-08 

PRVLP20P 

YENTPANEL 

3.62E-08 

i 

| 

PRYLP90P 

VENTPANEL 

3.62E-08 

PRVRP30P 

YENTPANEL 

3.62E-08 

PRVRPOOP 

VENTPANEL 

3.62E-08 

PRVCPIOP 

YENTPANEL 

3.62E-08 

PRYCP80P 

YENTPANEL 

3.62E-08 

PRYHFXOP 

3.62E-08 

PRVOOXOP 

3.62E-08 

PRYVPXOP 

VENTPANEL 

3.62E-08 

PNVRPMRG 

CNDMXXTM 

3.21E-08 

PNVCPMRG 

CNDMXXTM 

3.21E-08 

PNVLPMRG 

CNDMXXTM 

3.21  E-08 

WLDVNXIK 

3.04E-08 

WLDVPXLK 

VENTPANEL 

2.77E-08 

WLDYFXLK 

CNDVZXIG 

2.72E-08 

HEXCOPRP 

2.47E-08 

HEXROPRP 

2.47E-08 

HEXLOPRP 

2.47E-08 

MPBTFSLK 

CNDGRLK 

2.44E-08 

FLGTFXLK 

CNDGRLK 

2.44E-08 

MPBTOSLK 

CNDGRLK 

2.44E-08 

WLDTNXLK 

1.89E-08 

WLDEOXLK 

CNDGRLK 

1.76E-Q8 
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TPDCrL  W 

1.0IE-02 

> 

TPDLFLSZ 

1.61E-05 

tpdrfisz 

1.61E-05 

TPDCOtSZ 

1.6  IE-05 

TPDL0L5Z 

1.61E-08 

TPDROLSZ 

1.61  E-06 

MPBRP5LK 

HUMRPXHC 

1.49E-05 

MP8LP3LK 

HUMLPXHC 

I.49E-08 

MPBCP1LK 

HUMCPXHC 

1.49E-08 

WLDTJXLK 

1.26E-08 

WLDEFXtK 

CNDEZXIG 

1.20E-08 

CKVLPXCL 

1.16E-08 

CKVCPXCL 

1.16E-08 

CKVRPXCL 

1.16E-08 

MPBRP6LK 

HUMRPXHC 

8.0  IE-09 

MPBCP2LK 

HUMCPXHC 

8.01E-09 

1 

MPBLP4LK 

HUMLPXHC 

6.87E-09 

MPBTFPRP 

CNDGRLK 

6.64E-09 

WLDVOXLK 

CNDGRLK 

3.88E-09 

WLDVFXLK 

CNDGRLK 

2.72F-09 

IWLDYP4LK 

VENTPANEL 

2.72E-09 

WLDYP2LK 

VENTPANEL 

2.72E-G9 

WLDYP6LK 

VENTPANEL 

2.72E-09 

PLGLOPCL 

1.97E-09 

PLGCOPCL 

1.97E-09 

PLGROPCL 

1.97E-09 

WLDTOXLK 

CNDGRLK 

1.78E-Q9 

WLDEFXLK 

CNDGRLK 

1 .20E-09 

0 

FLGEOSLK 

CNDEZXIG 

1.29E-04 

2.05E-03 

SPVVPXDP 

VENTPANEL 

9.32E-05 

VENTPANEL 

NPBYP3LK 

8.68E-05 

MP8VP51K 

VENTPANEL 

8.42E-05 

MPBVP1LK 

VENTPANEL 

8.42E-05 

MPBEOSLK 

CNDEZXIG 

6.85E-0S 

CNDEZXIG 

FLGEFSLK 

6.09E-05 

MPBEFSLK 

CNDEZXIG 

4.57E-05 

MPBVOSLK 

CNDVZXIG 

4.1SE-05 

MPBYFSLK 

CNDVZXIG 

4.06E-05 

MPBEOPRP 

CNDEZXIG 

3.88E-05 

FLGEJSLK 

3.80E-05 

SPVCPCDP 

3.44E-05 

SPRPCDP 

3.44E-05 

SPVLPCOP 

3.44E-05 

BDPEOXPP 

CNDEZXIG 

3.20E-05 

CNDEZXIG 

BDPEFXRP 

3.20E-05 

MPBEUPRP 

3. 1 IE-05 

ACCRPXDP 

2.80E-05 

TNKVPQDP 

VENTPANEL 

2.80E-05 

TNKVP6DP 

VENTPANEL 

2.80E-05 

I 
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TNKYP 1 DP 

YENTPANEL 

2.80E-05 

TNKVP8DP 

YENTPANEL 

2.80E-05 

TNKYPEDP 

YENTPANEL 

2.80E-05 

TNKYP2DP 

YENTPANEL 

2.80E-05 

TNKVP7DP 

VENTPANEL 

2.90E-05 

TNKVP9DP 

VENTPANEL 

2.80E-05 

ACCLPXDP 

2.80E-05 

TNKVP3DP 

YENTPANEL 

2.80E-05 

TNKVP4DP 

YENTPANEL 

2.80E-05 

ACCCPXDP 

2.80E-05 

MPBVOSLK 

2.79E-05 

MPBEFPRP 

CNDEZXIG 

2.63E-05 

FLGENSLK 

2.29E-05 

FLGTNSLK 

2.29E-05 

MPBENSLK 

2.29E-05 

MPBYNSLK 

2.29E-05 

MPBLP3DP 

YENTPANEL 

2.20E-05 

MPBRP5DP 

YENTPANEL 

2.15E-05 

MPBCP1DP 

YENTPANEL 

2.15E-05 

MPBENPRP 

I.80E-05 

FLGTJSLK 

1.52E-05 

FLGEOSLK 

CNDGRLK  ' 

1.29E-05 

SCHVP6RP 

VENTPANEL 

t.ME-05 

SCHVP5RP 

VENTPANEL 

l.ME-05 

ACCCOMRP 

1.10E-05 

ACCLOMRP 

1.10E-05 

ACCROMRP 

1.10E-05 

MPBYOPRP 

CNDYZXIG 

8.54E-06 

MPBVJPRP 

8.07E-06 

REGCPICS 

7.30E-06 

! REGCP 1 OP 

7.30E-06 

REGCP7CS 

7.30E-06 

REGCP70P 

7.30E-06 

REGLP2CS 

7.30E-06 

REGLP20P 

7.30E-06 

REGLP8CS 

7.30E-06 

REGLP80P 

7.30E-06 

RE6RP3CS 

7.30E-06 

REGRP30P 

7.30E-06 

REGRP9CS 

7.30E-06 

REGRP90P 

7.30E-06 

REGVPXHI 

YENTPANEL 

7.30E-06 

TPSCFLLK 

7.06E-06 

TPSLFLLK 

7.06E-06 

TPSRFLLK 

7.06E-06 

TPSCOXRP 

7.06E-06 

TPSROXRP 

7.06E-06 

TPSLOXRP 

7.06E-06 

nPBEOSLK  |i 

CNDGRLK 

6.85E-C6 

I 
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Table  1 a 


mpbynprp 

0.69E-C0 

FLGEFSLK 

CNDGRLK 

6.09E-06 

MPBVFPRP 

CNDVZX1G 

6.00E-06 

MPBEFSLK 

CNDGRLK 

4.S7E-06 

MPBTNPRP 

4.15E-06 

MP6V0SLK 

CNDGRLK 

4.15E-06 

MPBYFSLK 

CNDGRLK 

4.06E-06 

PNVCFZCS 

3.92E-06 

PNYLQZCS 

3.92E-06 

PNV3NDCS 

3.92E-06 

PNV30DCS 

3.92E-06 

PNY3FDC5 

3.92E-06 

PNVCOZCS 

3.92E-06 

PNYRFZC5 

3.92E-06 

PNVROZCS 

3.92E-06 

PNV3JDC5 

3.92E-06 

PNVLFZCS 

3.92E-06 

MPBEQPRP 

CNDGRLK 

3.88E-06 

BDPEFXRP 

CNDGRLK 

3.20E-06 

BDPEOXRP 

CNDGRLK 

3.20E-06 

MPBLP3LK 

YENTPANEL 

3.10E-06 

MPBRP5LK 

VENTPANEL 

3.10E-06 

MPBCP1LK 

YENTPANEL 

3.10E-06 

MPBTUPRP 

2.77E-06 

MPBEFPRP 

CNDGRLK 

2.63E-06 

MPBVP2LK 

YENTPANEL 

2.63E-06 

MPBVP4LK 

VENTPAnEl 

2.63E-06 

MPBVP6LK 

VENTPANEL 

2.63E-06 

PRBROSLK 

2.55E-06 

PRBRFSLK 

2.55E-06 

PRBCF5LK 

2.55E-06 

PRBLFSLK 

2.55E-06 

PRBLOSLK 

2.55E-06 

PRBCOSLK 

2.S5E-06 

fiPBTNSLK 

2.54E-06 

MPBTJSLK 

2.54E-06 

PAVOTXPA 

2.54E-06 

FILRCPLK 

2.37E-06 

F1LLCPLK 

2.37E-06 

FILRCPLK. 

2.37E-06 

FLGTOXLK 

CNDGRLK 

2.29E-06 

CNDEZXIG 

WLDEOXLK 

1.84E-06 

FILLFYRP 

1.81E-06 

FILRFYRP 

1.81E-06 

FILCFYRP 

1.8  IE-06 

FILROYRP 

1.81  E— 06 

FILCOYRP 

1.8  IE-06 

FiLLOYRP 

1.8  IE-06 

MPBCP2LK 

VENTPANEL 

1 .67E-06| 

LMSC-F22304 
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MPBRP5LK  YENTPANEL 

WLDEJXLK 

BLORORRG 

BLOCOGRG 

BLOLOGRG 

8L0L0RRG 

BLOROGRG 

BLOCORRG 

MPBLP4LK  YENTPANEL 

TNKVP3DP  HUMRPXHC 

TNKYP 1 DP  HUMCPXHC 

TNKYP9DP  HUMLPXHC 

TNKVP6DP  HUMLPXHC 

TNKYP2DP  HUMLPXHC 

TNKYP6DP  HUMCPXHC 

TNKYP  7DP  HUMCPXHC 

TNKVPEDP  HUMRPXHC 

TNKVPODP  HUMRPXHC 

HGMEZSLK 

MPBLP4DP  YENTPANEL 

MPBCP2DP  YENTPANEL 

MPBRP6DP  VENTPANEL 

MPBYOPRP  CNDGRLK 

MPBVFPRP  CNDGRLK 

WLDYP3LK  VENTPANEL 

WLDVPILK  YENTPANEL 

WLDVP5LK  YENTPANEL 

CNDYZXIG  WLDYOXCLK 

MPBRPXLK  HUMRPXHC 

MPBLPXLK  HUMLPXHC 

MPBCPXLK  HUMCPXHC 

MPBTOPRP  CNDGRLK 

WLDVJXLK 

PRVLP20P  YENTPANEL 

PRVLP90P  YENTPANEL 

PRVRP30P  VENTPANEL 

PRVRPOOP  VENTPANEL 

PRVCPIOP  VENTPANEL 

PRVCP80P  YENTPANEL 

PRVHFXOP 
PRVOOXOP 

PRVVPXOP  VENTPANEL 

PNVRPMRG  CNDMXXTM 

PNVLPMRG  CNDMXXTM 

PNVCPMRG  CNDMXXTM 

WLDVNXLK 

WLDYPXLK  YENTPANEL 

Y/LDVFXLK  CNDVZXIG 

HEXCOPRP 


I.67E 

-06 

I.48E 

-06 

1.46E 

-06 

1.46E 

-06 

1.46E 

-06 

1.46E 

-06 

1.46E 

-06 

1.46E 

-06 

1.43E 

-06 

I.40E 

-06 

1.40E 

-06 

1.40E 

-06 

1.40E- 

-06 

1.40E- 

-06 

1.40E- 

-06 

1.4QE- 

-06 

1.40E- 

-06 

I.40E- 

-06 

1.24E- 

-06 

9.59E- 

-07 

9.59E- 

-07 

9.59E- 

■07 

8.54E- 

■07 

6.00E- 

■07 

5.27E- 

•07 

5.15E- 

-07 

5. 1 SE- 

•07 

TOSE - 

■07 

3.94E- 

•07 

3.94E- 

•07 

3.94E- 

■07 

3.92E- 

•07 

3.83E- 

•07 

3.77E- 

07 

3.77E- 

07 

3.77E- 

0? 

3.77E- 

07 

3.77E- 

07 

3.77E- 

07 

3.77E- 

07 

3.77E- 

07 

3.77E- 

07 

3.35E- 

07 

3.35E- 

07 

3.35E- 

0? 

3.17E- 

07 

2.88E- 

07 

2.84E- 

07 

2.S8E- 

n*7 
~ « 1 
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HEXROFRP 

HEXLOPRP 

MPBTFSLK 

figtfxlk 

MPBTOSLK 

WLDTNXLK 

WLDEOXLK 

TPDCFLS2 

TPDLFLSZ 

tpdrflsz 

TPDCOLSZ 

TPDLOISZ 

TPDROLSZ 

CNDGRLK 

CNDGRLK 

CNDGRLK 

CNDGRLK 

Z.5SE-07 

2.58E-07 

2.54E-07 

2.54E-07 

2.54E-Q7 

1.97E-07 

1.84E-07 

I.68E-07 

1.65E-07 

1.68E-07 

1.68E-07 

1.68E-07 

1.68E-07 

MPBRP5LK 

HUMRPXHC 

1.55E-07 

MPBLP3LK 

HUMLPXHC 

1.55E-07 

MPBCP1LK 

HUMCPXHC 

1.S5E-07 

CKVLPXCL 

1.20E-07 

CKVCPXCL 

1.20E-07 

CKYRPXCL 

1.20E-07 

MPBRP6LK 

HUMRPXHC 

8.35E-08 

MPBCP2LK 

HUMCPXHC 

8.35E-08 

MPBLP4LK 

HUMLPXHC 

7.17E-08 

MPBTFPRP 

CNDGRLK 

6.92E-08 

PN2R0Z0P 

CNDMXXTM 

4.49E-08 

PN2C0Z0P 

CNDMXXTM 

4.49E-08 

i 

PN2L0Z0P 

CNDMXXTM 

4.49E-08 

WLDVOXLK 

CNDGRLK 

4.05E-08 

WLDYP2LK 

YENTPANEL 

2.89E-08 

WLDYP4LK 

VENTPANEl 

2.89E-08 

WLDYP6LK 

1 YENTPANEL 

2.89E-08 

WLDVFXLK 

CNDGRLK 

2.84E-08 

i 

WLDVP4LK 

VENTPANEL 

2.83E-08 

WLDVP2LK 

VENTPANEL 

2.83E-08 

WLDYP6LK 

j VENTPANEL 

2.83E-08 

WLDTOXLK 

1 CNDGRLK 

1.86E-08 

WLDEFXLK 

CNDGRLK 

1.25E-08 

E 

SPYCPCDP 

1.41E-06 

K29E-05 

5PYRPCDP 

1.41E-06 

SPVLPCDP 

1.41E-06 

80PE0XRP 

CNDEZXIG 

1.32E-06 

BDPEFXRP 

CNDEZXIG 

I.32E-06 

ACCRPXDP 

1.15E-06 

ACCLPXDP 

USE-06 

ACCCPXDP 

U5E-06 

MPBCP1DP 

VENTPANEL 

8.85E-07 

SCHVP6RP 

VENTPANEL 

4.70E-07 

SCHYP5RP 

YENTPANEL 

4.70E-07 

MPBLP3LK 

YENTPANEL 

1.28E-0? 

MPBRP5LK 

YENTPANEL 

I.28E-07 
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MPBCP1LK 

YENTPANEL 

1.28E-07 

MPBLP4LK 

YENTPANEL 

5.89E-08 

MP8CP2DP 

VENTPANEL 

3.94E-08 

ttPBLP4DP 

YENTPANEL 

3.94E-08 

MPBRP6DP 

VENTPANEL 

3.94E-08 

MPBRPXLK 

HUMRPXHC 

1.62E-08 

MPBLPXLK 

HUMLPXHC 

1.62E-08 

MPBCPXLK 

HUMCPXHC 

1.62E-08 

PRYLP20P 

VENTPANEL 

1.55E-08 

PRYLP90P 

VENTPANEL 

1.55E-08 

PRVRP30P 

YENTPANEL 

1.55E-08 

PRVRPOOP 

YENTPANEL 

1.55E-08 

PRVCP10P 

YENTPANEL 

1.55E-08 

PRVCP90P 

VENTPANEL 

1.55E-08 

PRVHFXOP 

1.5SE-0Q 

PRV00X0P 

1.55E-08 

i 

PRVVPXOP 

VENTPANEL 

1.S5E-08 

WLDYP4LK 

YENTPANEL 

1.16E-09 

WLDVP2LK 

YENTPANEL 

1.16E-09 

WLDYP6LK 

YENTPANEL 

1.16E-09 

LMSC-F22304C 
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TabU  6-  1b 


LMSC-F223O40 


EVENT  TREE  QUANTIFICATION  USIN6 

FAULT  TREE  CUTSETS  FOR  RECOVERABLE  EVENTS 


CUTSETS 


A 

hPBRPXLK 

HUMRPXHC 

MPBIPXIK 

HUMLPXHC 

MPBCPXLK 

HUMCPXHC 

SPVLP3DC 

HUMLPXHC 

MPBVP3LK 

SPVCPIDC 

MPBVP1LK 

HUMCPXHC 

SPVRP50C 

MPBVP5LK 

HUMRPXHC 

MPBCP1LK 

HUMCPXLK 

MPBLP31K 

HUMLPXHC 

HPBRP5LK 

HUMRPXHC 

MP6RP6LK 

HUMRPXHC 

MPBCP2LK. 

HUMCPXHC 

MPBLP4LK 

HUMLPXHC 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

CNDFXXTF 

CNDFXXSR 

SDCENT 

CNDFXXTF 

CNDFXXSR. 

SDCENT 

CNDFXXTF 

CNDFXXSR  . 

SDRIGHT 

CNDFXXTF 

CNDFXXSR 

SDRIGHT 

CNDFXXTF  j 

CNDFXXSR 

SDCENT 

CNDFXXTF 

CNDFXXSR 

SDCENf 

CNDFXXTF 

CNDFXXSR 

SDRI6HT 

CNDFXXTF 

CNDFXXSR 

SDRIGHT 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

PNVLPMRG 

CNDFXXTF 

CNDFXXSR 

CNDFXXTF 

CNOFXXSR 

SDRIGHT 

CNDFXXTF 

CNDFXXSR 

SDRI6HT 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

CNDFXXTF 

CNDFXXSR 

i 

SDCENT 

CUTSET 

PROBABILITY 


TOTAL  FOR  A 


HY210JCD 

HY2CFJCD 

HY2CFWCD 

HY2R0WCD 

HY2ROJCD 

HY2C0JCD 

HY2COWCO 

HY2RFJCD 

HY2RFWCD 

HY2LFJCD 

HY2L0WCD 

HY2LFWCD 

SOLEFT 

PNVRPMRG 

CK2RF2CD 

CK.21FZCD 

CK2CF2CD 


2.2SE-07 
3,48E-08 
3.48E-08 
3.48E-08 
1 .92E-06 
I 86E-06 
1 .86E-OS 
1 .37E-08 
\ .37E-08 
1 .37E-06 
7.39E-09 
7.39E-09 
6.33E-09 
1.59E-10 
1.39E-10 
I.59E-10 
1.59E-10 
1.59E-10 
1.S9E-10 
1.S9E-10 
1.S9E-10 
1 59E-10 
1.59E-10 
1.59E-10 
1.59E-10 
1.35E-10 
1.35E-10 
4.87E-1 1 
4.87E-11 
4.87E-1 1 


Tablo  6-  > b 
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TOTAL  FOR  8: 

7.19E-07 

MPBRPXLK 

HUMRPXHC 

? 05E-07 

MPBLPXLK 

HUMLPXHC 

1 .05E-07 

MPBCPXIK 

HUMCPXHC 

t.OSE-O? 

SPVLP3DC 

HUMLPXHC 

MPBVP3LK 

5.75E-0B 

SPVRP5DC 

MPBVP5LK 

HUMRPXHC 

S.57E-0S 

SPVCP10C 

mpbvpilk 

HUMCPXHC 

5.S7E-08 

MPBCP1LK 

HUMCPXLK 

4.12E-08 

MP8LP3LK 

HUMLPXHC 

4.12E-0B 

MPBRP5LK 

HUMRPXHC 

4.12E-08 

MPBRP61K 

HUMRPXHC 

2.22E-08 

MPBCP2LK 

HUMCPXHC 

2.22E-08 

MPBLP4LK 

HUMIPXHC 

1.90E-08 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

HY2L0JCD 

3.40E-09 

CNDFXXTF 

CNOFXXSR 

SDCENT 

HY2CFJCD 

3.40E-09 

CNOFXXTF 

CNDFXXSR 

SDCENT 

HY2CFV/CD 

3.40E-09 

CNOFXXTF  ; 

CNDFXXSR 

SDRI6HT 

HY2R0WCD 

3.40E-09 

CNOFXXTF 

CNDFXXSR 

SDR16HT 

HY2R0JCD 

3.40E-09 

CNOFXXTF 

CNDFXXSR 

SDCENT 

HY2C0XD 

3.40E-09 

CNOFXXTF 

CNDFXXSR 

SDCENT 

HY2C0WCD 

3.40E-09 

CNDFXXTF 

CNDFXXSR 

SDRM3HT 

HY2RFJCD 

3.40E-09 

CNDFXXTF 

CNDFXXSR 

SDRtGHT 

HY2RFWCD 

3.40E-09 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

HY2LFJCD 

3.40E-09 

CNDFXXTF 

CNOFXXSR 

SDLEFT 

IHY2LCWCD 

3.40E-09 

CNOFXXTF 

CNDFXXSR 

SDLEFT 

HY2LFWCD 

3.40E-09 

PNVLPMRG 

CNOFXXTF 

CNDFXXSR 

SDLEFT 

2.88E-09 

CNDFXXTF 

CNOFXXSR 

SDRtGHT 

PNVRPMR6 

2.88E-09 

CNDFXXTF 

CNDFXXSR 

iSDRJGHT 

CK2RFZCD 

1 .04E-09 

CNOFXXTF 

CNDFXXSR 

SDLEFT 

CK2LFZC0 

1.04E-09 

CNDFXXTF 

CNDFXXSR 

SDCENT 

CK2CFZCD 

1.04E-09 
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Table  6-  lb 
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TOTAL  FOR  C: 

5.39E-07 

MPBRPXLK 

HUMRPXHC 

7 84E-08 

MPBLPXLK 

HUMLPXHC 

7.84E-08 

MPBCPXLK 

HUMCPXHC 

7.84E-08 

SPVRP5DC 

HPBVP5LK 

HUHRPXHC 

4.I8E-08 

SPVCP1DC 

MPBVP1LK 

HUHCPXHC 

4.16E-08 

SPVLP3DC 

HUHLPXHC 

HPBVP2LK 

4.31E-08 

HPBCP1LK 

HUMCPXLK 

3.09E-08 

MPBLP3LK 

HUttLPXHC 

3.09E-08 

MPBRP51K. 

HUMRPXHC 

3.09E-08 

MPBRP6LK 

HUMRPXHC 

1 .66E-08 

MPBCP2LK 

HUMCPXHC 

1 .66E-08 

MPBLP4LK 

HUMLPXHC 

1 43E-08 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

HY2LOJCD 

2.SSE-09 

CNOFXXTF 

CNDFXXSR 

SDCENT 

HY2CFJCD 

2.5SE-09 

CNOFXXTF 

CNDFXXSR 

SOCENT 

HY2CFWCD 

2.55E-09 

CNOFXXTF 

CNDFXXSR 

SORIGHT 

HY2R0WCD 

2.S5E-09 

CNDFXXTF 

CNDFXXSR 

SDRI6HT 

HY2R0UCD 

2.55E-09 

CNOFXXTF 

CNDFXXSR 

SDCENT 

HY2C0XD 

2.55E-09 

CNDFXXTF 

CNDFXXSR 

SDCENT 

HY2C0WCD 

2.S5E-09 

CNOFXXTF 

CNDFXXSR 

SORIGHT 

HY2RFJCD 

2.55E-09 

CNOFXXTF 

CNDFXXSR 

.SDR16HT 

HY2RFWCD 

2.55E-09 

CNOFXXTF 

CNDFXXSR 

SDLEFT 

HY2LFJCD 

2.55E-09 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

IHY2LOWCD 

2.55E-09 

CNDFXXTF 

CNDFXXSR 

SDLEFT 

HY2LFWCD 

2.55E-09 

PNVLPHRG 

! CNDFXXTF 

CNDFXXSR 

SDLEFT 

2.16E-09 

CNOFXXTF 

CNDFXXSR 

SORIGHT 

PNVRPHRG 

2.16E-09 

CNOFXXTF 

CNDFXXSR 

SORIGHT 

CK2RFZCD 

7.79E-10 

CNOFXXTF 

CNDFXXSR 

SDLEFT 

CK2LFZCD 

7.79E-10 

CNDFXXTF 

CNDFXXSR 

SDCENT 

CK2CFZCD 

7.79E-I0 
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TOTAL  FOR  D: 

1 .65E-05 

SEPINHIBtT 

CNDTUFCO 

SP130XFE 

2.93E-06 

SEPINHIBtT 

CNDTUFCO 

SP230XFE 

2.93E-06 

SEPINHIBtT 

CNDTUFCO 

SP13FXFE 

2.93E-06 

SEPINHIBtT 

CNDTUFCO 

SP23FXFE 

2.93E-06 

SEPINHIBIT 

CKTOOFCD 

CNDTUFCO 

2.03E-06 

CKTHFFCD 

SEPINHIBIT 

CNDTUFCO 

2. 03  E “06 

CNOTUFCO 

HUMTSXHC 

SP130XFE 

1 .47E-07 

CNOTUFCO 

HUMTSXHC 

SP230XFE 

1 .47E-07 

CNOTUFCO 

HUHTSXHC 

SP13FXFE 

1 .47E-07 

CNOTUFCO 

HUMTSXHC 

SP23FXFE 

1 .47E-07 

MPBRPXIK 

HUMRPXHC 

1 .48E-08 

MPBLPXLK 

HUMLPXHC 

I.48E-08 

MPBCPXLK 

HUMCPXHC 

1 .48E-08 

SPVLP3DC 

HUMLPXHC 

MPBVP3LK 

8.14E-09 

SPVRPSDC 

MPBVP5LK 

HUMRPXHC 

7.90E-09 

SPVCPtOC 

MPBVP1LK 

HUMCPXHC 

| 

7.90E-09 

CKTOOFCD 

CNDTUFCO 

HUMTSXHC 

6.77E-09 

CKTHFFCO 

! CNOTUFCO 

HUMTSXHC 

6.77E-09 

MPBCP1LK 

SHUMCPXLK 

5.83E-09 

MPBLP31K 

HUMLPXHC 

5.B3E-09 

MPBRPSLK 

HUMRPXHC 

5.83E-09, 

CNOTUFCO 

PNVTFFDC 

HUMTSXHC 

5.07E-09 

CNOTUFCO 

PNVTOFOC 

HUMTSXHC 

5.07E-09 

MPBCP2LK 

HUMCPXHC 

3.14E-09 

MPBRP6LK 

HUMRPXHC 

3.14E-09 

MPBLP4LK 

HUMLPXHC 

i 

i 

2.69E-09 

A 
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Tab)«  6-  1b 


TOTAL  FOR  E; 

1.04E-04 

HY2L0WCD 

CNDMXXTM 

6.19E-06 

HY2R0JC0 

CNDMXXTM 

6J9E-06 

HY2L0JC0 

CNDMXXTM 

6.19E-06 

HY2LFJCD 

CNDMXXTM 

6.19E-06 

HY2LFWCD 

CNDMXXTM 

6.19E-06 

HY2R0WCD 

CNDMXXTM 

6.59E-06 

HY2C0JCD 

CNDHXXTM 

6.19E-06 

HY2C0WCD 

CNDMXXTM 

6.19E-06 

HY2RFJCD 

CNDMXXTM 

6. 19E-Q6 

HY2RFWCD 

CNDMXXTM 

6.19E-06 

HY2CFJCD 

CNDMXXTM 

6.19E-06 

HY2CFWCD 

CNDMXXTM 

6.19E-06 

CK2C02CD 

CNDMXXTM 

3.72E-06 

CK2L02CD 

CNDMXXTM 

3.72E-06 

CK2ROZCO 

CNDMXXTM 

3.72E-06 

CK2CFZCD 

CNDMXXTM 

1.89E-06 

CK2LFZCD 

CNDMXXTM 

1 .89E-06 

CK2RFZCD 

CNDMXXTM 

I.B9E-06 

PN210Z0P 

CNDMXXTM 

7.20E-07 

PN2R0Z0P 

CNDMXXTM 

7.20E-07 

PN2C0Z0P 

CNDMXXTM 

7.20E-07 

PN2LFZ0P 

CNDMXXTM 

7.20E-07 

PN2CFZ0P 

CNDMXXTM 

7.20E-07 

PN2RFZ0P 

CNDMXXTM 

7.20E-07 

HY2CFWCD 

CNDMXXTM 

* 

4.37E-07 

HY2RFWC0 

CNDMXXTM 

4.37E-07 

HY2C0JCD 

CNDMXXTM 

4.37E-07 

HY2R0JC0 

CNDMXXTM 

4.37E-07 

HY2C0WCD 

CNDMXXTM 

4.37E-0? 

HY2R0WCD 

CNDMXXTM 

4.37E-07 

HY2CFJC0 

CNDMXXTM 

4.37E-07 

HY2RFJCD 

CNDMXXTM 

4.37E-07 

HY2LFJCD 

CNDMXXTM 

4.37E-07 

HY2L0JCD 

CNDMXXTM 

4.37E-07 

HY2LFWCD 

CNDMXXTM 

4 37E-07 

HY2L0WCD 

CNDMXXTM 

4.37E-07 

CK2C02CD 

CNDMXXTM 

2.63E-07 

CK2L0ZCD 

CNDMXXTM 

2.63E-07 

CK2R0ZCD 

CNDMXXTM 

2.63E-07 

CKT30DSP 

2.63E-07 

CKT3FDSP 

2.63E-07 

MPBRPXLK  | 

HUMRPXHC 

2.09E-07 

MPBRPXLK  1 

-iUMRPXHC 

2.09E-07 

MPBIPXLK  1 

HUMLPXHC 

2.09E-07 

MPBCPXIK  1 

■fUMCPXHC 

2.09E-07 

SPVRP5DC  1 

1PBVP5LK  1 

HUMRPXHC 

1 .1  IE-07 

SPVCP10C  |l 

1PBVP1LK  | 

HUMCPXHC 

1.UE-07 
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SPVLP3DC  HUMLPXHC  IMPBVP3LK 
MPBCPILK  HUMCPXLK 
MPBLP3LK  HUMLPXHC 
MPBRP5LK  HUMRPXHC 
MPBRPBLK  HUMRPXHC 
MPBCP2LK  HUMCPXHC 
MPBLP4LK  HUMLPXHC 


SEPINHIBIT 

SEPiNHIBlT 

SEPINHIBIT 

SEPINHIBIT 

SEPINHIBIT 

CKTHFFCD 

CNOTUFCO 

CNDTUFCO 

CNDTUFCO 

CNDTUFCO 

MPBRPXIK 

MPBLPXLK 

MPBCPXLK 

CKTOOFCD 

CKTHFFCD 

CNDTUFCO 

CNDTUFCO 

SPVLP3DC 

SPVRPSDC 

SPVCP1DC 

MPBCPILK 

MPBLP3LK 

MPBRPSLK 

MPBCP2LK 

MPBRP6LK 

MPBLP4LK 


TOTAL  FOR  F; 


CNDTUFCO 

CNDTUFCO 

CNDTUFCO 

CNDTUFCO 

CKTOOFCD 

SEPINHIBIT 

HUMTSXHC 

HUMTSXHC 

HUMTSXHC 

HUMTSXHC 

HUMRPXHC 

HUMLPXHC 

HUMCPXHC 

CNDTUFCO 

CNDTUFCO 

PNVTFFDC 

PNVTOFDC 

HUMLPXHC 

MPBVP5LK 

MPBVP1LK 

HUMCPXLK 

HUMLPXHC 

HUMRPXHC 

HUMCPXHC 

HUMRPXHC 

HUMLPXHC 


SP130XFE 

SP230XFE 

SP13FXFE 

SP23FXFE 

CNDTUFCO 

CNDTUFCO 

SP130XFE 

SP230XFE 

SP13FXFE 

SP23FXFE 


HUMTSXHC 

HUMTSXHC 

HUMTSXHC 

HUMTSXHC 

MPBVP3LK 

HUMRPXHC 

HUMCPXHC 
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G 

SEPfNHlBIT 

TOTAL  FOR  6: 

1.65E-05 

CNDTUFCO 

SP130XFE 

2.93E-06 

SEPINHIB1T 

CNDTUFCO 

SP230XFE 

2.93E-06 

SEP1NHIBIT 

CNDTUFCO 

SP13FXFE 

2.93E-06 

SEPINHIBIT 

CNDTUFCO 

SP23FXFE- 

2.93E-06 

SEPINHISIT 

CKTOOFCD 

CNDTUFCO 

2.03E-06 

CKTHFFCD 

SEPINHIBIT 

CNDTUFCO 

2.Q3E-06 

CNDTUFCO 

HUMTSXHC 

SP130XFE 

1 .476-07 

CNDTUFCO 

HUMTSXHC 

SP230XFE 

1 .47E-07 

CNDTUFCO 

HUMTSXHC 

SP13FXFE 

1 .47E-07 

CNDTUFCO 

HUMTYSXHC 

SP23FXFE 

1 .47E-07 

MPBRPXLK 

HUMRPXHC 

1 .48E-08 

MPBLPXLK 

HUMLPXHC 

1.486-08 

MPBCPXLK 

HUMCPXHC 

1 .48E-09 

CKTOOFCD 

CNDTUFCO 

HUMTSXHC 

1 .376-08 

CKTHFFCD 

CNDTUFCO 

HUMTSXHC 

1 .37E-08 

CNDTUFCO 

PNVTFFDC 

HUMTSXHC 

1 .036-06 

CNDTUFCO 

PNVTOFDC 

HUMTSXHC 

1.03E-06 

SSPVIP30C 

HUMLPXHC 

MPBVP3UC 

6.I4E-09 

SPVRP5DC 

IMPBVPSLK 

HUMRPXHC 

7.90E-09 

SPVCP  t DC 

MPBVP1LK 

HUMCPXHC 

7,906-09 

MPBCP11K 

HUMCPXLK 

5.83E-09 

MPBLP3LK 

HUMLPXHC 

S.83E-09 

MPBRP51K 

HUMRPXHC 

S.83E-09 

MPBCP2LK 

HUMCPXHC 

3.14E-09 

MPBRP6LK 

HUMRPXHC 

3.14E-09 

MPBLP4LK 

HUMLPXHC 

2.696-09 

PHASE  H: 

No  cutsets  above  probability  - £-10 
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FIGURE  6- la:  MISSION  TIME  SEQUENCE  EVENT  TREE 

Explosion,  Over  pressurization,  & Non- recoverable  Events 


* Not  qualified  Out  of  Analytical  scope. 


**  For  definitions  of  coded  consequences,  see  Table  6-2 


FIGURE  6- lb:  MISSION  TIME  SEQUENCE  EVENT  TREE 

Recoverable  Events  (Functional  Failures) 


quantified.  Out  of  analytical  scope.  **  For  definitions  of  coded  consequences,  see  Table  6-2 


TABLE  6-2 

Definition  of  Consequence  Categories 
and  Specific  Consequences 


HUMAN  LOSSES 

C - Mission  Crew 

S - S round  Support  Team 

Gj  • Olher  persons  in  vicinity  susceptible  to 

fatalities  incurred  during  RTLS  abort  landing 

or  explosions  during  flight  near  the  launch 

facility 

Oj  * Other  persons  in  vicinity  susceptible  to 
fatalities  incurred  during  TAL  abort  landing 
Oj  - Other  persons  in  vicinity  susceptible  to 


HARDWARE  LOSSES 

S - Space  Transportation  System 

F “ Ground  Facilities  and  Support  Equipment 

L|  - Abort  Landing  Facilities  - RTLS 

Lg  - Abort  Landing  Facilities  - TAL 

L3-  Abort  Landing  Facilities  - orbit  ebort 

Mj*  Miscellaneous  damage  resulting  from  dispersion  of 

explosion  debris  prior  to  T + 30s. 

M 2*  Miscellaneous  damage  resulting  from  dispersion  of 
explosion  debris  when  STS  is  on  the  launch  pad 


LMSC-F22304 


TABLE  6-3 

Consequence  Data  Summary 


• 

HUMAN  LOSS 

(Expected  Number  of  Fatalities) 

HARDWARE  LOSS 
(Million  Dollars  Lost) 

TIME  (0 

Crew 

Srcund 

Support 

Team 

Other 

STS 

Ground 
Facilities  (3) 

Abort  Landing 
Facilities  ( 4) 

Mist 

-9  hours  to 
-2  hours 

N/A 

(1) 

negligible 

N/A 

1300 

500 

N/A 

( 

10 

-2  hours  to 
- 1 0 seconds 

7 

(1) 

negligible 

N/A 

1300 

500 

N/A 

t 

\ 

10 

-10  seconds  to 
+30  seconds 

7 

(D 

negligible 

N/A 

1300 

500 

N/A 

( 

10 

>30  seconds  to 
+2.5  minutes 

7 

N/A 

(2) 

6.5e-7 

1300 

N/A 

N/A 

( 

.at 

* 2.S  minutes  to 
+ 4 minutes 

7 

N/A 

negligible 

1300 

N/A 

50 

(6) 

negligi 

+ 4 minutes  to 
+ 8,1  minutes 

7 

N/A 

negligible 

1300 

N/A 

i 

50 

1 

1 

(6) 

negiigi 

+8.1  minutes  to 
abort  lending 

7 

N/A 

N/A 

1300 

l 

N/A 

50 

N// 

NOTES: 


( 1)  Reference  34,  Table  10-3,  Case  No.  1. 

(2)  Reference  34,  Table  10-3.  Modify  Ec  by  scaling  by  1.78e-4/1.  le-3  to  accommodate  Figur 
6-  la,  branch  C probabllty  of  hazard  versus  that  computed  in  Table  10-3. 

(3)  Reference  ( to  be  provided). 

(4)  RTLS,  TAL  and  Orbital  Abort  landing  sites. 

(5)  Assume  $ 1 0 million  per  Incident  for  surrounding  buildings  & structures. 

(6)  Reference  34 , Table  10-3,  take  computed  value  of  PI  for  stage  1 and  assume  $ I OM  per  incld 
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Abrogate  Probabilities  and  Risk 


Cateqcru 
applicable 
sequence 
probabilities 
(Fig.  6-  I a) 


Probability  of  MPPS-Related  Events  Potentially  leading  to  Loss  of  Human  Life 
Successful  Abort  Scenario 


2.02E-4 
2.05E-3 
1 29E-5 


RISK* 
Expected  No 
of  lives  lost 


negligible 


O.OE+OOJ  OjOE+OOI O.OE+OOj 


Probability  of  PTPS>Related  Events  Potentially  leading  to  Loss  of  Human  Life  ■ 
Unsuccessful  Abort  Scenario 


RISK*  j 1 .7E-t 

Expected  No. 
of  lives  lost  | 

* Derived  from  Table  6-3 


1 .7E-02lnegligi>le 


3 .3E- 1 3 negligible  [negligible 
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Aggregate  Probabilities  and  Risk 

Probability  of  MPPS-Related  Events  Potentially  Leading  to  Loss  of  Hardware  or  Facilities 
Successful  Abort  Scenario 


i 

I 


Category 

s 

LI 

L2 

L3 

Ml 

M2 

applicable 
sequence 
probabilities 
(Fig.  6-1 

2.02E-4 
2.05E-3 
1 -29E-5 

■ 

2.02E-"! 

subtotal 

2.26E-3 

2.02E-4 

O.OOE+O 

O.OOE+O 

O.OOE+O 

0 OOE+Q 

applicable 
sequence 
probabilities 
(Fig.  6- lb) 

225C-7 
7.1SE-7 
8.8SE-12 
1 72E-9 
8.2SE-6 

22SE-7 

subtotal 

9.20E-6 

■kx&SI 

O.OOE+0 

O.OOE+O 

O.OOE+O 

O.OOE+O 

LI2IAL J 2.3E-3L  2.0E-4I  O.OE+Ol  O.OE+Ol  O.OE+Ol  ? 0P-4l  n nr*nl 

RISK* 

Expected  loss 
of  hardware 
(in  $M) 

3 

0.1 

0 

0 

0 

negligible 

negligible 

Probability  of  MPPS-Related  Events  Potentially  Leading  to  Loss  of  Hardware  or  Facilities 


Unsuccessfu 


Abort  Scenario 


s 

F 

LI 

L2 

L3 

Ml 

M2 

applicable 

- 

- 

<■» 

“ 

2.02E-4 

sequence 

- 

_ 

prob^biliti** 

2.Q2E-4 

2.02E-4 

(Fig.  6- la) 

2.0SE-3 

2.26E-3 

2.00E-4 

O-OOE+O 

O.OOE+O 

OJDOE+O 

2.02E-4 

OOOE+O 

applicable 

2.25E-7 

2.25E-7 

5.38C-7 

1.04C-4 

8.25E-6 

- 

sequeoc* 

7.  ISC-7 

probabilities 

5.38E-7 

(Fig.  6- lb) 

1.04C-4 

8.25E-6 

■ 

1.14E-4 

BKE&3B 

5.38E-7 

1 04E-4 

8.25E-6 

O.OOE+O 

O.OOE+O 

I TOTAL 


2.4E-3I  2.0E-4I  S.4C-71  1.0E-41  8.3E-6 


2.0E-4I 


o oe+o 


RISK* 

Expected  loss 
of  hardware 
(in  $H) 

3 

0.1 

rteghgfcle 

0.0003 

negligible 

j 

negligfcle 

negligible 

* Derived  from  Table  6-3 
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SPACE  SHUTTLE 

PROBABILISTIC  RISK  AS8ES8MENT 
PROOP-OP-CONCEPT  STUDY 
ANALYSIS  REPORT 


EXECUTIVE  SUMMARY 

This  document  focuses  on  the  transfer  of  the  Probabilistic  Risk 
Assessment  (PRA)  methodology  to  a Space  Shuttle  environment 
utilizing  the  Auxiliary  Power  Unit  (APU)  and  Hydraulic  Power  Unit 
(HPU)  as  typical  examples  of  spacecraft  subsystems.  This  volume 
presents  specific  PRA  findings  of  this  proof-of-concept  study  and 
attempts  to  answer  the  following  question:  Can  the  PRA  methodology 
be  transferred  to  a space  system? 

The  study  results  resembled  those  of  previous  PRAs  accomplished 
in  other  industries.  The  study  produced  a quantification  of  the 
frequency  of  certain  undesired  end  states,  along  with  a ranking 
of  specific  subsystem  failure  modes  by  their  contribution  to  the 
risk  of  these  end  states. 

For  the  APU,  the  study  indicates  that  five  failures  account  for 
about  80%  of  the  total  risk  of  Loss  of  Crew/Vehicle  (LOC/V)  during 
a typical  flight.  An  additional  five  failures  account  for  over 
90%  of  the  total  risk.  The  common  hazard  associated  with  the 
first  five  failures  is  hydrazine  leakage  into  the  aft  compartment. 
This  creates  the  potential  for  fire,  as  demonstrated  at  the 
conclusion  of  the  STS-9  mission  when  there  were  two  APU  fires. 

The  HPU  has  two  failures  that  represent  over  98%  of  the  contri- 
bution to  LOC/V.  These  contributions  could  arise  from  common 
cause  lube  oil  contamination  in  two  HFUs  by  fuel  leaking  into 
the  gearbox,  or  by  introduction  of  foreign  substances  into  the 
gearbox,  and  from  turbine  wheel  failures. 

The  APUs  are  about  two  orders  of  magnitude  more  of  a risk  to  the 
safety  of  the  Shuttle  than  are  the  HPUs.  The  bulk  of  the  risk 
from  the  APUs  arises  from  the  potential  for  fire  from  any  hydra- 
zine leaks  which  manifest  themselves  as  a fire  during  entry. 

The  PRA  results  indicate  that  for  both  the  APU  and  HPU,  only 
a few  failures  account  for  the  majority  of  the  risk  during  a 
typical  flight.  The  results  illuminated  no  new  areas  of  concern 
or  failures  not  previously  known,  but  do  identify  the  high  risk 
failure  scenarios  that  map  the  paths  between  the  end  states  and 
individual  APU  and  HPU  failures. 

The  PRA,  therefore,  provided  a quantitative  way  of  prioritizing 
the  known  safety  concerns  and  failure  modes.  It  also  provided 
an  estimate  of  the  magnitude  of  risk  of  each  safety  concern. 
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2.0  INTRODUCTION 


McDonnell  Douglas  was  selected  by  the  National  Aeronautics  and 
Space  Administration  (NASA)  to  assess  the  Probabilistic  Risk 
Assessment  (PRA)  methodology  when  applied  to  a space  system. 

The  PRA  has  been  in  use  by  other  industries  for  many  years.  The 
study  attempts  to  provide  insight  to  answer  several  questions. 

One  of  these  questions  is:  Can  the  PRA  methodology  be  trans- 

ferred to  a space  system? 

This  volume  provides  information  for  the  evaluation  of  the  PRA 
methodology  transfer,  the  benefits  to  be  gained  from  application 
of  PRA  methodology,  and  the  information  necessary  for  the  FMEA/ 
CIL  comparison  described  in  Volume  II.  Volume  I discusses  the 
management  aspects  of  the  study  as  related  to  the  results. 

Volume  IV  documents  the  PRA  preparation  instructions. 

Pickard,  Lowe,  and  Garrick,  Inc.  (PLG) , a firm  experienced  in  the 
use  of  the  PRA  technique  in  other  industries,  was  selected  as  a 
subcontractor  to  provide  the  expertise  and  software  analysis 
tools  necessary  to  adapt  the  PRA  methodology  to  the  Space  Shuttle 
environment. 

Two  subsystems  were  chosen  for  this  proof-of-concept  study: 

a.  The  Orbiter  Auxiliary  Power  Unit  (APU) , designed  and 
manufactured  by  the  Sundstrand  Corporation  as  a 
subcontractor  to  Rockwell  International  Corporation,  and 

b.  The  Solid  Rocket  Booster  (SRB)  Hydraulic  Power  Unit  (HPU) , 
also  manufactured  by  the  Sundstrand  Corporation  but  under 
contract  to  United  Space  Boosters  Incorporated  (USBI) . 

The  system  configuration  of  the  APU  and  HPU  used  in  this  study 
was  that  which  existed  as  of  January  1986.  The  "Improved"  APU 
and  post-51L  flight  modifications  to  the  APU  were  not  analyzed, 
except  as  specifically  noted  elsewhere  in  the  report. 

The  PRA  process  offers  a different  type  of  risk  analysis  tool 
available  to  industries  or  agencies  who  must  deal  with  risk 
assessment.  The  PRA  begins  with  the  consideration  of  effects 
that  are  deemed  undesirable.  The  analysis  proceeds  from  the 
top  down  through  the  system  or  systems  via  scenario  paths  that 
ultimately  lead  to  the  failed  component  or  assembly.  The 
process  proceeds  to  the  lowest  level  of  detail  that  time, 
effort,  funds,  or  available  data  permits. 
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The  Probabilistic  Risk  Assessment  involves: 

a.  An  integrated  model  of  the  responses  of  an  engineered 
system  to  disturbances  during  operation 

b.  A rigorous  and  systematic  identification  of  the  levels  of 
damage  that  could  conceivably  result  from  those  responses 

c.  A quantitative  assessment  of  the  frequency  of  such 
occurrences  and  of  the  uncertainty  in  that  assessment 


Although  the  PRA  process  produces  a quantification  of  risk,  the 
actual  numbers  produced  are  not  the  only  important  results. 

The  important  results  from  a risk  management  perspective  are: 

a.  The  insight  gained  into  the  system  under  study 

b.  The  frequency  of  occurrence  of  the  damage  states 

c.  The  relative  ranking  of  failure  scenarios  and  component 
failure  modes 

d.  Identification  of  failure  modes  which  account  for  the 
majority  of  the  risk 

e.  How  well  the  risk  is  known  (uncertainty  of  the  results) 

The  PRA  is  a decision-making  tool  for  managing  the  risk  associated 
with  the  system  under  investigation.  It  points  out  weak  areas  in 
the  system,  and  aids  in  deciding  where  "fixes"  are  warranted.  The 
numbers  produced  are  valuable  to  the  extenr  that  they  give  a 
decision-maker  a way  to  decide  what  is  important  and  what  is  not 
important.  Resources  may  then  be  allocated  based  on  specific  needs 
such  as  reduction  of  high  risk,  cost,  or  schedule  impact. 

The  next  section  summarizes  the  conclusions  and  insights  gained 
into  the  transfer  of  PRA  methodology  to  these  Shuttle  subsystems, 
as  well  as  insight  gained  into  the  APU  and  HPU  risk.  The  indivi- 
dual risk  contributors  which  comprise  99%  of  the  risk  to  LOC/V 
were  ranked  according  to  their  contribution  to  the  likelihood  of 
the  damage  state.  The  risk  contributors  that  collectively 
represent  1%  of  the  risk  were  grouped. 
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The  remaining  sections,  4 through  11,  describe  the  APU  and  HPU 
system  configuration  used  for  this  study,  the  PRA  methodology,  the 
application  of  that  methodology,  and  the  conclusions  and  insights 
that  were  obtained  during  the  course  of  this  study. 

Assumptions  are  inherent  to  any  analysis  and  PRA  is  no  exception. 
Assumptions  were  made  to  define  the  boundaries  of  each  system,  the 
system  interfaces,  the  boundary  conditions  of  the  interfaces,  and 
the  general  modeling  guidelines  used  to  conduct  the  study.  These 
assumptions  and  guidelines  are  described  in  detail  in  Appendix  A 
and  are  discussed  where  appropriate  in  Sections  5 through  11. 

The  results  presented  in  this  volume  are  intended  to  be 
representative  of  the  kind  obtained  by  a PRA  and  not  indicative 
of* actual  Shuttle  results.  The  numerical  predictions  of  LOC/V 
from  the  pilot  study  are  not  deemed  reliable,  because  the  data- 
base used  was  uncertified,  the  various  designs  and  diagrams  had 
not  been  subjected  to  any  configuration  control,  and  the  PRA 
process  itself  was  not  conducted  with  any  peer  review  or  manage- 
ment oversight  function.  For  this  reason,  any  risk  numbers  or 
probability  curves  discussed  in  the  later  volumes  of  this  report 
are  purely  representational  in  nature,  and  should  not  be  used  for 
hardware  certification,  flight  readiness  review,  nor  should  they 
be  regarded  as  being  an  accurate  expression  of  the  reliability 
of  either  the  APU  or  HPU.  The  results  are  intended  only  as  a 
"template"  to  test  fit  the  PRA  methodology,  and  should  not  be 
taken  out  of  context  or  used  for  any  other  engineering  purpose. 
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3.0  SUMMARY  CONCLUSIONS  AND  TNSTGHTS 


This  section  presents  a summary  of  the  technical  conclusions  and 
lessons  learned  concerning  the  transfer  of  PRA  technology  to  the 
Space  Shuttle.  It  also  provides  insights  into  the  risk  posed  by 
operation  of  the  Auxiliary  Power  Unit  (APU)  and  Hydraulic  Power 
Unit  (HPU)  on  a typical  Shuttle  mission,  and  lessons  learned  which 
may  be  of  value  for  implementing  PRA  on  other  space  systems. 


3.1  PRA  TECHNOLOGY  TRANSFER 

The  PRA  techniques  (such  as  fault  trees  and  event  trees)  applied  in 
this  study  have  reached  various  states  of  sophistication  through 
application  in  the  nuclear,  chemical,  and  aircraft  industries. 

Space  Shuttle  systems,  their  interfaces  with  each  other,  with 
operators,  and  with  operating  procedures,  share  much  in  common 
with  systems  in  these  industries.  It  was,  therefore,  expected 
that  PRA  techniques  could  be  applied  to  the  Space  Shuttle;  the 
difficulty  of  the  task  was  the  unknown. 

A successful  application  of  PRA  techniques  requires  a balance  of 
knowledgeable  PRA  personnel  and.  system  experts ; each  must  acquire 
some  of  the  skills  of  the  other.  This  proof— of— concept  study 
successfully  demonstrated  the  adaptation  of  PRA  techniques  on  two 
Shuttle  subsystems  in  the  following  manner:  The  damage  states  on 

which  the  study  was  based  were  identified;  the  study  groundrules 
and  constraints  were  developed;  the  PRA  models  were  developed;  the 
historical  records  of  past  missions  and  of  the  APU  and  HPU  were 
obtained  and  analyzed;  action  items  were  generated  to  resolve 
important  issues  concerning  hydrazine  and  its  properties;  data- 
bases were  developed  to  compile  and  correlate  failure  history 
data;  the  models  were  quantified;  the  uncertainties  in  the  data 
and  models  were  developed  using  probability  distributions;  the 
risk  profiles  were  obtained;  and  the  contributors  to  the  Shuttle's 
risk  due  to  the  APU  and  HPU  were  identified  and  ranked. 

PRA  may  be  considered  an  "engineering  art"  in  which  the  combined 
skills  and  knowledge  of  many  are  required  to  apply  the  basic  PRA 
techniques  in  combinations  which  accurately  and  logically  model  the 
risk  posed  by  the  system.  There  were  no  standard  "cook  book"  pro- 
cedures for  applying  PRA  techniques  to  the  Space  Shuttle  systems. 

A generalized  set  of  PRA  techniques  were  developed  as  part  of  this 
study  which  may  have  application  to  other  space  systems. 

This  study  identified  and  documented  how  failures  initiated  by 
the  APU  or  HPU  can  propagate  through  a subsystem  to  cause 
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degraded  performance,  shuttle  damage,  or  mission  curtailment. 

This  was  accomplished  by  identifying  damage  states,  and  by  identi- 
fying failure  scenarios  emanating  from  initial  failures  in  the  APU 
or  HPU  that  lead  to  the  deunage  states.  The  damage  states  used  in 
this  study  were  Loss  of  Crew/Vehicle,  and  Loss  of  Mission.  Loss 
of  mission  was  further  divided  into  intacr  abort.  Primary  Landing 
Site  (PLS)  entry,  and  launch  scrub.  A risk  profile,  which 
represents  the  likelihood  of  the  damage  state  occurring  and  the 
uncertainty  about  that  likelihood,  was  assessed  for  each  of  these 
deunage  states.  The  study  was  able  to  divide  the  mission  into 
stages  that  allowed  the  assessment  of  risk  for  ascent  as  distinct 
from  orbit  and  entry.  The  PRA  addressed  mechanical,  electrical 
and  electronic  failures,  interactions  caused  by  functional  and 
spatial  relationships,  and  failures  of  multiple  components  due  to 
a common  cause. 

It  should  be  noted  that  additional  damage  states  could  have 
been  selected  which,  for  example,  allow  for  the  identification 
of  equipment  damage  and  subsequent  cost  of  repairing  failures. 
Additional  damage  states  such  as  these  add  unnecessary 
complexity  when  one  is  primarily  interested  in  damage  states 
that  pose  risk  of  LOC/V.  However,  the  techniques  appear  quite 
capable  of  quantifying  risk  to  equipment  just  as  reliably  as 
they  handle  the  more  serious  cases. 


3.2  CONCLUSIONS  AND  INSIGHTS  INTO  THE  RISK  OF  THE  APU  AND  HPU 

The  PRA  results  present  risk-related  information  about  the  APU 
and  HPU  in  several  ways.  They  provide  risk  profiles,  a ranked 
order  of  scenarios  contributing  to  the  risk  profiles,  a ranked 
order  of  APU/HPU  failures  contributing  to  the  failure  scenarios, 
and  a ranking  of  component  failure  modes  that  contribute  to  the 
risk  profile. 

The  risk  profiles  for  loss  of  crew/vehicle  for  the  APU  and  HPU 
are  shown  in  Figure  3-1.  These  data  are  proof-of-concept  study 
results  and  are  not  to  be  used  for  engineering,  design  evalua- 
tion, or  flight  certification.  The  contribution  of  HPU  risk  to 
the  Shuttle  is  clearly  much  lower  than  the  contribution  of  APU 
risk,  even  with  uncertainties  included. 


3.2.1  Insights  Into  APU  Risk 

What  are  the  major  risk  contributors  of  the  APU?  Table  3-1,  at 
the  end  of  this  section,  presents  the  APU  risk  contributors 
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FIGURE  3-1  PROBABILITY  DISTRIBUTION  COMPARISON 


(failure  inodes)  that  contribute  over  99%  of  the  likelihood  of 
loss  of  crew/vehicle  during  a flight.  The  risk  from  all  other 
contributors  combined,  therefore,  makes  a negligible  contribution 
to  the  overall  risk  associated  with  APUs.  The  first  three  major 
contributors  are:  (1)  hydrazine  leakage  into  the  aft  compartment 

from  at  least  one  APU  during  orbit  or  entry  with  potential  for 
fire  or  corrosion  damage  to  other  equipment,  (2)  hydrazine  leak- 
age into  either  isolation  valve  solenoid  cavity,  and  (2)  failure 
of  the  APU  turbine  wheel.  This  includes  all  failures  of  the 
turbine  such  as  bearing  seizure  and  fragmentation  of  the  wheel 
causing  shrapnel  damage  to  other  equipment.  Hydrazine  leakage 
contributes  about  four  times  more  to  risk  than  all  the  others 
combined.  Therefore,  reducing  either  the  likelihood  or  effects 
of  this  leakage  would  provide  the  most  benefit  in  terms  of  risk 
reduction  for  invested  resources. 

The  large  (74.6%)  contribution  from  the  general  category  of 
hydrazine  leaks  downstream  of  the  isolation  valves,  and  the 
desire  to  rank  the  risk  contributors  to  a finer  detail,  led 
to  a second  iteration.  Table  3-2  identifies,  more  specifically, 
the  risk  points  of  leakage  downstream  of  the  isolation  valves. 

For  example,  71.6%  of  this  risk  can  be  attributed  to  the  first 
three  leak  sources.  Fuel  leakage  into  the  fuel  isolation  valve 
remains  high  on  the  risk  table. 

Hydrazine  leakage  was  the  initial  failure  in  many  scenarios. 

The  PRA  identified  and  documented  the  leakage  related  scenarios 
via  event  sequence  diagrams  and  event  trees  as  shown  in 
Appendices  B6.3  and  B6.4,  respectively.  Table  3-3  summarizes 
the  quantified  result  of  this  process  by  presenting  the  percent 
of  the  LOC/V  risk  attributable  to  each  category  of  scenarios 
and  the  percent  contribution  of  the  categories  of  scenarios 
attributable  to  individual  APU  failure  modes.  The  risk  profile 
was  also  broken  down  directly  into  failed  components  or 
assemblies  as  shown  in  Tables  3-1  and  3-2. 

The  LOC/V  risk  from  APUs  is  clearly  dominated  by  leakage  of 
hydrazine  leading  to  the  cascading  effects  of  fire,  hydrazine 
corrosion,  hydrazine  decomposition  reactions,  and  possibly 
detonation.  These  effects  were  assessed  to  lead  to  failure  of 
either  an  adjacent  APU  or  other  flight  critical  equipment  in 
the  aft  compartment  with  a relatively  high  frequency.  This 
assessment  resulted  from  historical  Shuttle  data  and  from  the 
recognition  that  the  aft  compartment  is  very  crowded.  The 
compartment  contains  main  propulsion  equipment,  electronics,  and 
exposed  wiring  whose  insulation  (such  as  Kapton)  is  susceptible 
to  the  damaging  effects  of  hydrazine.  All  are  in  close  proximity 
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to  hydrazine  sources.  There  are  no  effective  barriers  between 
the  hydrazine  sources  and  the  rest  of  the  equipment  in  the  aft 
compartment.  When  the  Shuttle  descends  to  an  altitude  of  about 
60,000  feet  during  entry,  sufficient  atmospheric  oxygen  is 
* available  to  support  combustion  of  free  hydrazine  in  the  aft 
compartment,  provided  that  an  ignition  source  exists.  The  APUs 
themselves  provide  sufficiently  hot  surfaces  to  ignite  leaking 
hydrazine.  The  effects  of  hydrazine  ignition  were  dramatically 
demonstrated  by  the  two  APU  fires  that  occurred  at  the  end  of  the 
STS -9  flight. 

The  study  also  revealed  that  propagating  failure  effects  from 
common  cause  failures  (as  revealed  in  the  APU  failure  history 
database)  led  to  a risk  that  was  far  greater  than  would  be 
expected  if  APUs  were  failing  independently.  The  benefits 
of  redundant  APUs  are  not  being  realized.  The  STS-9  fire 
demonstrated  that  a single  hydrazine  leak  can  fail  two  APUs. 
Restricted  lube  oil  flow  has  affected  the  same  APU  on  two 
separate  missions  due  to  contamination  introduced  during 
ground  servicing.  Restricted  circulation  of  lube  oil  due  to 
contamination  has  already  caused  a launch  scrub.  However,  it 
is  recognized  that  procedures  have  been  instituted  to  minimize 
the  possibility  of  lube  oil  contamination.  In  addition,  a new 
design  in  the  seal  cavity  drain  of  the  Improved  APU  will 
eliminate  the  common  fuel  and  lube  oil  seal  drain  that  exists 
on  the  present  APUs. 

Since  hydrazine  leakages  can  occur  from  any  one  of  the  APUs  and  ( 

a single  leak  can  lead  to  LOC/V,  the  presence  of  three  APUs 
(two  of  three  of  which  are  required  to  operate) , from  a purely  < 

mathematical  point  of  view,  is  more  detrimental  to  flight  safety  > 

than  are  two.  Even  without  cascading  failures,  a configuration  / 

in  which  one  out  of  two  must  operate  for  success  tends  to  be  more  J 
reliable  than  a two  out  of  three  configuration.  One  approach  [ 

that  would  significantly  reduce  the  risk  would  be  to  affect  a j 

design  wherein  each  of  the  three  APUs  is  independently  capable  of  / 
supporting  the  demands  of  the  Orbiter  hydraulic  system.  Another  ■ 
less  rigorous  approach  might  be  to  erect  barriers  to  isolate  each  i 
APU  from  the  rest  of  the  aft  compartment.  The  barriers  would 
also  serve  to  reduce  the  detrimental  effects  of  shrapnel  produced  j 
by  turbine  breakup  while  operating  during  the  flight.  j 

Because  of  the  high  probability  of  hydrazine  leakage,  inspection 
and  leak  check  procedures  should  be  reviewed  for  adequacy.  Another 
approach  is  to  certify  that  the  vehicle  is  capable  of  operating 
throughout  the  flight  envelope  (ascent  as  well  as  entry)  on  a 
single  APU.  This  would  result  in  significant  reduction  in  the 
risk  of  LOC/V  as  determined  from  this  study.  The  study  results 
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were  heavily  influenced  by  the  assumption  that  two  APUs  were 
required  for  safe  flight. 

Further  results  of  this  study  are  discussed  in  Section  8 of  this 
Volume  and  include  APU  risk  associated  with  launch  scrub  and  with 
the  ascent  phase  of  a typical  flight.  This  Section  has  summarized 
the  orbit/entry  phase  which  poses  the  greater  risk  to  flight. 

3.2.2  Insights  Into  HPU  Risk 

The  HPU  has  been  assessed  as  posing  very  little  risk  of  loss  of 
crew/vehicle.  Table  3-4  presents  a breakdown  of  the  risk  profile 
into  its  risk  contributors  (failure  modes) . Two  failures  contri- 
bute over  98%  of  the  risk  posed  by  the  HPU.  These  two  failures 
are  lube  oil  circulation  restriction  due  to  common  cause  contami- 
nation, and  failure  of  the  HPU  turbine  wheel.  As  in  the  APU  this 
includes  all  failures  cf  the  turbine  including  wheel  fragmentation 
leading  to  shrapnel  damage  to  other  equipment.  The  risk  from  all 
other  failures  combined,  therefore,  makes  only  a 2%  contribution 
to  the  LOC/V  risk  due  to  the  HPUs.  Table  3-5  provides  a break- 
down of  the  risk  profile  into  scenarios  and  the  HPU  failures 
associated  with  the  scenarios. 

The  risk  posed  by  the  HPUs  appears  to  be  far  less  than  that  of  the 
APU  for  five  fundamental  reasons. 

a.  Risk  is  directly  proportional  to  flight  duration.  The  HPU 
operates  in-flight  for  about  3%  as  long  as  the  APU. 

b.  The  dominant  contributor  to  APU  risk  is  not  appropriate  to  the 
HPU.  The  risk  from  hydrazine  leakage  on  the  APU  is  associated 
with  the  long  duration  that  hydrazine  must  be  contained  during 
orbit,  coupled  with  the  potential  for  fire  during  entry.  The 
HPU  need  contain  hydrazine  for  only  about  2 minutes  during 
ascent  and  the  environment  around  the  HPU  in  the  aft  skirt  is 
purged  with  nitrogen  to  prevent  fires. 

c.  The  SRB  aft  skirt  is  much  less  crowded  with  flight  critical 
equipment  than  the  Orbiter  aft  compartment,  and  the  two  HPUs 
appear  to  be  well  separated.  In  addition,  damage  from  the 
shrapnel  spray  pattern  is  minimized  by  the  orientation  of  the 
turbine  wheel.  Therefore,  cascading  effects  from  either  hydra- 
zine leakage  or  turbine  fragmentation  have  relatively  little 
chance  of  harming  a second  HPU  or  flight  critical  equipment. 

d.  The  HPU  is  similar  in  design  to  the  APU  and  is  constructed  by 
the  same  manufacturer.  The  APU  requirements  for  duration  of 
service  and  ability  to  cope  with  the  environmental  extremes  of 
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ascent,  orbit,  entry  and  landing  are  more  demanding  than  is 
required  for  the  HPU.  From  a reliability  viewpoint,  the  HPU 
appears  to  have  a large  design  margin. 

e.  The  HPU  undergoes  a stringent  post  flight  disassembly  and 
refurbishment.  It  also  undergoes  a thorough  pre-flight 
reassembly  and  checkout  procedure.  The  failure  history 
indicates  that  these  procedures  are  effective  in  reducing  the 
frequency  of  failures  during  hot  fire  tests  as  well  as  flight, 
despite  the  detrimental  effects  of  immersing  the  HPUs  in  sea 
water  at  the  end  of  each  flight.  Essentially,  new  HPUs  are 
flown  each  flight. 

3.3  PRA  IMPLEMENTATION  LESSONS 

The  application  of  PRA  to  a Shuttle  subsystem  yielded  some  lessons 
about  methodology,  data  acquisition,  and  management  aspects  of  this 
study  which  may  be  of  benefit  for  future  application  to  PRA  in 
other  space  systems. 


3.3.1  Methodology  Lessons 


A number  of  challenges  appeared  during  the  course  of  this  study  and 
several  insights  were  gained  into  the  PRA  process  as  applied  to  an 
aerospace  subsystem  as  a result.  They  are  as  follows: 

a.  Multi-stage  modeling  may  be  required  in  which  the  risk  model  is 
divided  into  stages.  In  this  study  these  stages  were  defined 
on  the  basis  of  mission  time  intervals . Each  time  interval  was 
characterized  by  a different  APU  mode  of  operation,  a different 

of  flight  rules,  and  different  potential  damage  states. 

b.  Evaluation  of  cascading  failure  effects,  such  as  hydrazine  ”” 
leakage  which  can  propagate  deunage,  requires  extensive 
modeling  and  analysis  of  physical  processes.  The  results  of 
these  analyses  then  must  be  converted  to  a form  suitable  for 
use  in  a risk  model. 

c.  The  highly  interactive  nature  of  the  APU  with  its 
surroundings  requires  careful  event  tree  design  to  capture 
all  important  dependencies. 


d. 


Coupling  of  propagating  failure  effects  with  random  equipment 
failures  requires  highly  coupled  fault  tree  and  event  tree 
models. 


J 
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Some  of  the  challenges  were  typical  of  any  first-of-a-kind  study. 
A PRA  cannot  be  completed  without  a thorough  knowledge  of  the 
system,  its  interfaces,  procedures,  operator  interactions  and 
failure  and  success  history.  All  task  members  must  share  some 
degree  of  this  knowledge,  as  well  as  to  acquire  certain  PRA 
skills.  The  unfamiliarity  with  the  relative  importance  of 
various  APU/HPU  failure  modes  caused  a number  of  false  starts 
with  respect  to  the  risk  model  development.  In  particular,  the 
study  task  group  could  not  draw  on  a deep  well  of  experience  to 
unambiguously  define,  on  the  first  try,  which  aspects  of  the 
scenarios  could  be  treated  by  event  trees , which  by  fault  trees , 
which  by  data,  and  which  by  physical  process  modeling. 

The  study  task  group  believes  that  the  optimal  use  of  the 
techniques  has  not  yet  been  found  and  that  application  of  PRA 
techniques  will  continue  to  evolve  toward  an  aerospace  specific 
methodology. 


3.3.2  Data  Acquisition  Lessons 

Although  manned  spaceflight  dictates  a certain  level  of  record 
keeping  in  support  of  safety  and  reliability,  it  was  known  from  the 
outset  that  data  collection  and  validation  was  no  small  driver  in 
the  successful  completion  of  the  study.  Databases  developed  to 
support  the  needs  of  various  organizations  are  not  necessarily  in 
the  format  needed  to  support  a PRA.  In  addition,  the  type  of  data 
needed  for  a PRA  can  be  distinctly  different  from  that  required 
for  other  types  of  analyses.  This  is  especially  true  when  dealing 
with  spatial  considerations  of  the  subsystem  under  study. 

Examples  of  further  data  difficulties  encountered  are  as  follows: 

a.  Some  failures  were  written  against  the  APU,  using  its  part 
number  rather  than  the  specific  component  part  number,  with- 
in the  APU  that  failed.  Extra  time  was  required  to  identify 
the  actual  component  that  failed. 

b.  Incomplete  failure  records  or  partial  data  entries  were  not 
uncommon.  Extra  time  was  required  to  resolve  the  issue,  or 
the  data  was  eventually  discarded  for  lack  of  substantiating 
information. 

c.  Different  data  sources  use  different  computer  software  and 
• hardware.  This  hampered  the  task  of  automating  the  data 

for  compiling  and  sorting. 


3-8 


d.  Inconsistencies  exist  in  formatting.  Failures  were  tied  to 
an  expected  mission  or  mission  date,  not  a calendar  date. 

Run  times  were  in  different  units  of  time.  Extra  time  was 
required  for  correlating  failures  and  tabulating  data. 

e.  The  inability  to  determine  exactly  when  design  changes  were 
implemented  made  data  screening  difficult.  What  component 
design  should  be  used  to  establish  failure  rates? 

f.  It  was  difficult  to  use  "borrowed"  data  base  material  which 
lacked  proper  documentation  (e.g.,  data  file  size,  content 
and  attributes) . Extra  time  was  required  to  establish 
electronic  data  transfer. 

g.  Access  to  the  data  sources  was  difficult.  NASA  vendors  are 
reluctant  to  provide  information  without  formal  authorization 
and,  in  most  cases,  without  compensation. 

A great  cost  savings  could  be  realized  in  conducting  a PRA  if  the 
appropriate  data  could  be  assembled  into  coherent  and  consistent 
electronic  databases  that  are  easily  accessible. 


3.3.3  Management  Aspects 

Successful  performance  of  a PRA  requires  continuous  interaction 
among  members  of  the  PRA  study  group.  These  members  must  have  a 
great  depth  of  understanding  of  the  system  under  investigation, 
as  well  as  being  thoroughly  familiar  with  PRA  methodology  and 
techniques.  The  model  development  and  data  analysis  requires  a 
disciplined  and  organized  effort;  each  step  and  intermediate 
result  must  be  well  documented. 

While  individual  team  members  may  work  on  different  aspects  of 
the  analysis,  all. aspects  must  merge  into  the  same  risk  model. 
All  these  factors  point  to  the  necessity  for  continuous, 
effective  intra-team  communication  in  order  to  achieve  a 
coordinated  effort.  There  is,  of  course,  an  additional  need 
for  effective  communication  between  the  study  team  and  other 
NASA  or  contractor  organizations  from  which  the  team  must 
acquire  needed  information. 
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IMPORTANCE  RANKING  OF  APU  FAILURES 
LOC/V  - WHOLE  FLIGHT  - 1st  ITERATION 


COMPONENT/ASSEMBLY  % CONT- 
RARY FAILURE  RISK  CONTRIBUTORS  RIBUTION 

1 Fuel  System  Leak  Into  Aft  Compartment  From  74.6 

Location  Downstream  of  Isolation  Valve 

2 Leak  Into  Fuel  Isolation  Valve  Solenoid  3.8 

Cavity 

3 Turbine  Wheel  Failure  3.8 

4 Leak  Into  Primary  Valve  Solenoid  Cavity  2.9 

(GGVM  Detonation) 

5 Primary  Valve  Fails  Closed  at  APU  Start  2.4 

6 Lube  Oil  Circulation  Restricted  2.3 

7 Fuel  Tank  GN2  Fill  Q.D.  Leakage  (Low  Fuel  1.8 

Tank  Pressure) 

• Any  MPU  Fails  High  at  APU  Start*  1.3 

9 Fuel  Tank  Diaphragm  Leakage  1.2 

10  Secondary  Fuel  Valve  Fails  to  Open  at  APU  0.9 

Start 

11  Heater  Pair  116/117  Fails  Off  on  Orbit  0.8 

12  Any  MPU  Fails  High  While  APU  is  Running*  0.7 

13  MPU  1 Fails  Low  at  APU  Start  0.7 

14  Loss  of  Power  to  Secondary  Fuel  Valve  at  0.6 

APU  Start 


* Later  information  indicates  that  MPU  fail  high  may  not 
be  a credible  failure  mode 
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TABLE  3-1  (Concluded) 
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COKPONENT/ASSEHBLY  % CONT- 


KAMI: 

FAILURE  RISK  CONTRIBUTORS 

RIBUTXON 

15 

Loss  of  Power  to  Fuel  Tank  Isolation  Valves 
at  APU  Start 

0.6 

16 

Fuel  Tank  GN2  Leakage 

0.5 

17 

Fuel  Pump  Bypass  Valve  Fails  to  Close  After 
APU  Start 

0.4 

18 

Heater  Pair  111/112  Fails  Off  On  Orbit 

0.3 

19 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  at  APU  Start 

0.1 

20 

Fuel  Isolation  Valve  Fails  to  Close  at  APU 
Shutdown  (GGVM  Large  Leak) 

0.08 

21 

Fuel  Isolation  Valve  Leaks  at  Closure  After 
Ascent 

0.08 

22 

Loss  of  Power  to  Secondary  Fuel  Valve  While 
APU  is  Running 

0.02 

23 

Primary  Fuel  Valve  Controller  Output  Fails 
On  While  APU  Running 

0.01 

24 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  While  APU  Running 

0.01 

25 

All  Other  Failures 

0.10 

Total 

100.00 

NOTE:  Proof -of -concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 
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TABLE  3-2 


Page  1 of  2 


RANK 

1 

2 

3 

4 

5 

C 

7 

8 
9 

10 

11 

12 

13 

14 

15 
1C 
17 


IMPORTANCE  RANKING  OP  APU  FAILURES 
LOC/V  - WHOLE  FLIGHT  - 2nd  ITERATION 

COKPONENT/AS8EXBLY 
FAILURE  RISK  CONTRIBUTORS 

Leakage  From  Gas  Generator  Injector  Tube 
Leakage  From  Fuel  Lines  and  Fittings 
Leakage  From  Fuel  Pump 

Leak  Into  Fuel  Isolation  Valve  Solenoid  Cavity 

Leak  Into  Primary  Valve  Solenoid  Cavity  (GGVM 
Detonation) 

Primary  Valve  Fails  Closed  While  Pulsing 

External  Leakage  From  GGVM 

Lube  Oil  Circulation  Restricted 

Fuel  Pump  Shaft  Seal  Detonation 

Fuel  Tank  GN2  Fill  Q.D.  Leakage  (Low  Fuel  Tank 
Pressure) 

Heater  Pair  111/112  Fails  Off  On  Orbit 

Heater  Pair  116/117  Fails  Off  On  Orbit 

Fuel  Tank  Diaphragm  Leakage 

Secondary  Fuel  Valve  Fails  To  Open  At  APU 
Start 

MPU  1 Fails  Low  At  APU  Start  Valves  At  APU 
Start 

Loss  Of  Power  To  Secondary  Fuel  Valve  At  APU 
Start 

Loss  of  Power  To  Fuel  Tank  Isolation  Valves 
At  APU  Start 


% CONT- 
RIBUTION 

35.5 

23.3 

12.8 

4.0 

3.3 

3.1 

3.0 
2.8 
1.8 
1.7 

1.6 

1.4 

1.1 
0.9 

0.7 

0.5 

0.5 
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TABLE  3-2  (Concluded) 


Page  2 of  2 


RANK 

COMPONENT/  AS  SEMBLY 
FAILURE  RISE  CONTRIBUTORS 

% CONT- 
RIBUTION 

18 

Turbine  Wheel  Failure 

0.4 

19 

Fuel  Tank  GN2  Leakage 

0.4 

20 

Fuel  Pump  Bypass  Valve  Fails  To  Close  After 
AFU  Start 

0.3 

Subtotal 

99.1 

21 

Leakage  From  Fuel  Line  Flex  Hose 

0.30 

22 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  At  APU  Start 

0.09 

23 

Leakage  From  Fuel  High  Point  Bleed  Q.D. 

0.05 

24 

Leakage  From  Fuel  Test  Port  Q.D. 

0.04 

25 

Fuel  Isolation  Valve  Fails  To  Close  At  APU 
Shutdown 

0.04 

28 

Fuel  Isolation  Valve  Leaks  At  Closure  After 
Ascent 

0.04 

27 

Loss  of  Power  To  Secondary  Fuel  Valve  While 
APU  Is  Running 

0.04 

28 

Primary  Fuel  Valve  Controller  Output  Fails  On 
While  APU  Is  Running 

0.01 

29 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  While  APU  Is  Running 

0.01 

30 

All  Other  Failures 

0.28 

Total 

100.00 

NOTE:  Proof -of -concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 


3-13 


TABLE  3-3 


Page 


IMPORTANCE  RANKING  OF  APU  FAILURE  SCENARIOS 
LOC/V  - WHOLE  KI88I0N 

RANK  FAILURE  SCENARIO  RISK  CONTRIBUTORS 


1 Hydrazine  leak  downstream  of  fuel  isolation 
valves  and  into  aft  compartment  during  ore it  or 
entry  that  leads  to  failure  of  two  APUs  or 
flight  critical  equipment 

Contributors : 

a.  Leakage  from  any  one  APU  (100%) 

2 Hydrazine  leak  as  above,  but  from  two  or  three 
APUs  concurrently 

Contributors : 

a.  Leakage  from  combinations  of  two  APUs  (91%) 

b.  Leakage  from  three  APUs  (9%) 

3 Hydrazine  leak  from  a single  APU  as  above,  with 
an  independent  failure  of  another  APU 

Contributors : 

a. .  Hydrazine  leak  in  one  APU,  with  equipment 

failure  of  another  APU  while  running  (see 
below  for  breakdown  into  APU  failure  modes) 
(88%) 

b.  Hydrazine  leak  in  one  APU,  with  start 
failure  of  another  APU  (see  below  for 
breakdown  into  APU  failure  modes)  (12%) 

4 Equipment  failure  of  two  APUs  during  orbit, 
entry,  or  landing  (failures  not  related  to  APU 
start) 

a.  Lube  oil  circulation  restricted  on  two  APUs 
(16%) 

b.  Primary  fuel  valve  fails  closed  while 
pulsing  on  one  APU  and  fuel  tank  GN2  quick 
disconnect  leaks  on  another  APU  (7%) 
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% CONT- 
RIBUTION 


39.1 


26.5 


6*4 
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rani: 


TABLE  3-3  (Continued) 
FAILURE  SCENARIO  RISE  CONTRIBUTORS 


% CONT- 
RIBUTION 


c.  Lube  oil  circulation  restricted  in  one  APU, 
and  primary  fuel  valve  fails  open  while 
pulsing  on  another  APU  (€%) 

d.  Primary  fuel  valve  fails  closed  while 
pulsing  in  two  APUs  (6%) 

e.  Primary  fuel  valve  fails  closed  while 
pulsing  on  one  APU,  and  fuel  tank  diaphragm 
leaks  on  another  APU  (4%) 

f.  Lube  oil  circulation  restricted  in  one  APU, 
and  fuel  tank  GN2  quick  disconnect  leaks  on 
another  APU  (4%) 

g.  Fuel  tank  diaphragm  leak  on  one  APU,  and 
fuel  tank  GN2  quick  disconnect  leaks  on 
another  APU  (3%) 

h.  Next  36  scenarios  have  combinations  of  lube 
oil  circulation  restricted,  tank  diaphragm 
leaks,  primary  fuel  valve  closure,  nitrogen 
leak  from  fuel  tank,  MPU  failures,  turbine 
failures,  and  loss  of  power  to  fuel  tank 
isolation  valves  (34%) 

5 Fail  to  start  one  APU  at  TIG-5  in  orbit  and  4.0 

equipment  failure  of  second  APU  while  running 

Contributors : 

IMPORTANT  APU  START  FAILURE8 : 

a.  Secondary  fuel  valve  fails  to  open  on 
demand  to  start  (18%) 

b.  MPU  1 fails  low  on  demand  to  start  (14%) 

c.  Electric  power  to  secondary  fuel  valve 
fails  at  start  (11%) 

d.  MPU  1 fails  high*  (9%) 


Later  information  indicates  that  MPU  fail  high  may 
not  be  a credible  failure  mode 
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RANK 
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TABLE  3-3  (Continued) 
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% CONT- 

PAILURE  SCENARIO  RISX  CONTRIBUTORS  RIBUTION 


e.  MFU  2 fails  high*  (9%) 

f.  MPU  3 fails  high*  (9%) 

g.  Fuel  pump  bypass  valve  fails  closed  (9%) 

h.  Fuel  pump  bypass  valve  fails  open  (9%) 

i.  Electric  power  to  fuel  tank  isolation  valve 
fails  at  start  (7%) 

IMPORTANT  APU  EQUIPMENT  FAILURES: 

j . Primary  fuel  valve  fails  closed  during 
pulsing  (19%) 

k.  Fuel  tank  GN2  fill  quick  disconnect  fails 
open  (13%) 

l.  Heaters  fail  off  by  common  cause  (14%) 

m.  Lube  oil  circulation  restricted  (12%) 

n.  Fuel  tank  diaphragm  leaks  (8%) 

o.  Fuel  tank  nitrogen  leakage  (3%) 

p.  MPU  2 fails  high*  (3%) 

q.  MPU  3 fails  high*  (3%) 

r.  Turbine  wheel  failure  (3%) 

Hydrazine  leaks  into  isolation  valve  solenoid,  3.8 

auto-decomposes , ruptures  valve  cover,  and 
contents  of  fuel  tank  are  dumped  into  aft 
compartment 

Contributors : 

a.  Leakage  into  solenoid  cavity  (100%) 


Later  information  indicates  that  MPU  fail  high  may 
not  be  a credible  failure  mode 
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TABLE  3-3  (Co&ClUdad) 
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% CONT- 

RANE  FAILURE  SCENARIO  RISE  CONTRIBUTORS  RIBUTION 


7 Turbine  comes  apart  at  normal  speed  during  3 . l 

entry;  shrapnel  and  hydrazine  effects  fail 

a second  APU  or  flight  critical  equipment 

Contributors : 

a.  Turbine  wheel  comes  apart  and  escapes 
housing  (100%) 

8 Hydrazine  leak  from  two  APUs  as  above , with  an  1.9 

independent  failure  of  another  APU 

Contributors : 

a.  Leakage  with  equipment  failure  of  APU  while 
running  (100%) 

9 Turbine  comes  apart  at  normal  speed  during  0.9 

ascent;  shrapnel  effects  fail  a second  APU  or 

flight  critical  equipment 

Contributors : 

a.  Turbine  wheel  comes  apart  and  escapes 
housing  (100%) 

10  Equipment  failure  of  one  APU  during  ascent  and  0.9 

another  during  orbit  or  entry 

Contributors : 

a.  Breakdown  of  APU  failures  provided 
previously 

11  All  Others  8.4 


TOTAL  100.0 


NOTE:  Proof -of -concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 


3-17 


Page  1 of  1 


TABLE  3-4 

IMPORTANCE  RANKING  OF  EPU 
FAILURE  MODE8 

LOSS  OF  CREW  OR  VEHICLE 

COKPONENT/ASSEKBLY  % CONT- 

KANKING  RISK  CONTRIBUTORS  RIBUTION 

1 Lube  oil  circulation  restricted  55.0 

2 Turbine  wheel  failure  43.0 

3 Primary  control  valve  transfers  1.0 

closed  while  pulsing 

4 All  other  failures  1.0 

TOTAL  100.0 


for  design  evaluation  or  flight  certification. 
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TABLE  3—5 

IMPORTANCE  RANKINS  OP  HPU  FAILURE  8CENARIOS 


LOC/V 


% CONT- 

RANX  FAILURE  8CENARIO  RISK  CONTRIBUTORS  RIBUTION 


1 Equipment  failure  of  2 HPUs  on  the  same  SRB  56.8 

between  lift-off  and  SRB  SEP 

Contributors  and  % Contribution  to  scenario  l: 

a.  Common  cause  restriction  of  lube  oil 

circulation  causing  bearing  overheat  and 
failure  of  rotating  equipment  in  the 
gearbox  (99%) 


2 Turbine  failure  leading  to  shrapnel  induced  43.0 

failure  of  a second  HPU  or  other  flight 
critical  equipment  between  lift-off  and 
SRB  SEP 

Contributors  and  % Contribution  to  Scenario  2: 

'a.  Turbine  fragmentation  at  normal  speed 
(100%) 


3 All  Others  0 . 2 


TOTAL  100.0 


NOTE:  Proof -of -concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 
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4.0  SYSTEM  DESCRIPTIONS 


This  section  provides  a brief  technical  description  of  the  two 
Space  Shuttle  subsystems  which  were  the  subjects  of  this  pilot 
study.  These  two  subsystems,  the  Orbiter  Auxiliary  Power 
Unit  (APU)  and  the  Solid  Rocket  Booster  Hydraulic  Power  Unit 
(HPU) , are  similar  in  form  and  function,  and  share  many  common 
hardware  components.  However,  there  are  also  numerous  differ- 
ences between  them,  due  to  the  HPU's  less  demanding  operational 
requirements . The  HPU  operates  for  about  2.5  minutes  during 
a flight,  whereas  the  APU  operates  for  approximately  1.5  hours. 
In  addition,  it  is  not  necessary  for  the  HPU  to  start  or  run 
under  zero  gravity  conditions. 

The  two  subsystems  are  discussed  separately  in  Sections  4 . 1 
through  4.6.  The  reader  desiring  a more  detailed  description 
is  referred  to  the  references  listed  in  Section  12.0. 


4.1  APU  SYSTEM  DESCRIPTION  AND  OVERVIEW 

The  Space  Shuttle  Orbiter  has  three  independent  hydraulic  systems 
similar  to  those  found  on  large  aircraft.  These  hydraulic  systems 
are  used  to  actuate  the  Orbiter  aero-surfaces,  throttle  and  gimbal 
the  Orbiter  main  engines , deploy  and  steer  the  landing  gear , apply 
the  landing  gear  brakes,  and  retract  the  external  tank/umbilical 
plates  when  the  external  tank  separates  from  the  Orbiter. 

Power  for  the  Orbiter  hydraulic  systems  is  provided  by  three 
identical  APUs,  one  for  each  hydraulic  system.  These  APUs  and 
their  controllers  are  mounted  on  the  forward  bulkhead  of  the 
Orbiter  aft  compartment,  as  shown  in  Figure  4-1,  and  generate 
power  by  means  of  a catalytic  reaction  of  liquid  hydrazine. 


4.2  APU  MISSION  OPERATIONS 

The  APUs  are  operated  by  the  Orbiter  flight  crew,  using  flight 
deck  controls  and  displays.  The  APUs  cannot  be  controlled  by 
ground  command  uplink.  However,  extensive  telemetry  on  APU 
status  is  available  to  Space  Shuttle  ground  controllers. 
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In  a typical  flight,  the  three  APUs  are  started  5 minutes  before 
lift-off  and  operate  throughout  the  launch  phase.  They  are  shut 
down  after  the  Orbital  Maneuvering  System  (OMS)  orbit  insertion 
burn  when  hydraulic  power  is  no  longer  required.  The  APUs  are  re- 
started for  the  deorbit  burn  and  entry,  and  are  shut  down  shortly 
after  landing.  In  addition,  one  APU  is  usually  run  briefly  the  day 
before  de-orbit  to  support  a checkout  of  the  Orbiter  flight  control 
system. 

While  the  APUs  are  operating,  they  obtain  lube  oil  cooling  from 
three  separate  water  spray  boilers,  one  for  each  APU.  During  the 
inactive  period  on  orbit,  APU  fluids  are  maintained  within  desired 
temperature  ranges  by  thermostatically  controlled  heaters. 


4.3  APU  DESIGN  AND  FUNCTION 

The  APU  is  designed  to  achieve  a high  output  of  power  in  a 
compact  package.  It  accomplishes  this  by  means  of  a catalytic 
reaction  of  liquid  hydrazine.  This  reaction  produces  a high 
velocity  flow  of  hot  gas,  which  is  used  to  spin  a turbine.  A 
speed  reduction  gearbox  transmits  the  power  of  the  spinning 
turbine  to  the  associated  Orbiter  main  hydraulic  pump. 

Each  APU  consists  of  the  following  subassemblies: 

(a)  Fuel  tank  and  fuel  lines 

(b)  Fuel  isolation  valves  (two  in  parallel) 

(c)  Fuel  pump 

(d)  Gas  generator  valve  module  (two  control  valves) 

(e)  Gas  generator 

(f)  Turbine 

(g)  Gearbox 

(h)  Electronic  controller 

(i)  Exhaust  duct  assembly 

(j)  System  of  heaters  for  orbit  thermal  control 

(k)  Post-shutdown  cooling  system  for  the  fuel  pump/valve  module 

(l)  Hot  start  cooling  system  for  the  gas  generator  injector 

(m)  Fuel/lube  oil  seal  cavity  drain  system 

Figure  4-2  is  a schematic  diagram  of  the  APU  system. 

Since  the  APU  interfaces  directly  with  other  subsystems,  the 
diagram  also  depicts  the  APU  boundary  limits  for  the  purposes 
of  this  study. 
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GGVM  = GAS  GENERATOR  VALVE  MODULE 


The  hydrazine  fuel  supply  is  stored  in  a 28-inch  diameter 
titanium  fuel  tank  and  is  pressurized  with  nitrogen  during 
servicing.  The  gas  pressure  provides  start  capability  through 
the  fuel  pump  bypass  valve  until  the  fuel  pump  is  running,  and 
acts  against  the  tank  diaphragm  to  positively  expel  fuel  to  the 
AFU.  The  fixed-displacement  APU  fuel  pump  provides  a constant 
flow  of  hydrazine  to  the  Gas  Generator  Valve  Module  (GGVM)  after 
the  initial  bootstrap  start.  Approximately  325  lbs.  of  fuel  is 
loaded  into  each  fuel  tank  for  a typical  mission. 

The  APU  turbine  speed  is  controlled  by  the  GGVM.  The  valve  module 
consists  of  two  flapper-type  valves  in  series.  The  primary  or  modu- 
lating valve  downstream  of  the  pump  is  normally  open  and  allows 
flow  to  the  secondary  or  shutoff  valve.  The  secondary  valve  is 
normally  in  by-pass,  which  directs  hydrazine  flow  back  to  the  pump 
inlet.  In  the  powered  state,  it  allows  hydrazine  flow  to  the  gas 
generator.  The  APU  controller  cycles  the  primary  valve  to  maintain 
proper  turbine  speed  (about  74,000  rpm) . In  the  high  speed  mode, 
the  controller  cycles  the  secondary  valve  to  maintain  a speed  of 
about  81,000  rpm.  For  safety,  the  primary  valve  will  begin  pulsing 
again  to  maintain  a speed  of  about  83,000  rpm  if  the  secondary 
valve  fails  open.  The  gas  generator  (GG)  is  a pressure  vessel 
containing  a granular  catalyst.  Hydrazine  flowing  into  the  GG  is 
decomposed  by  the  catalyst,  producing  hot  gases  which  are  directed 
to  the  turbine  assembly. 

The  dual-pass  turbine  assembly  converts  hot  gas  kinetic  energy 
into  mechanical  shaft  power  at  the  desired  speeds  to  operate  the 
hydraulic  pump,  APU  lube  oil  pump,  and  APU  fuel  pump . 

The  speed-reducing  gearbox  contains  gears,  bearings,  seals,  and 
a scavenger  lubrication  system.  The  gearbox  is  pressurized 
with  nitrogen  to  prevent  vaporization  of  the  lubricant.  A lube 
oil  pump  circulates  the  lube  oil  to  the  hydraulic  system  water 
boiler  for  cooling.  The  gearbox  has  a make-up  pressurization 
system  consisting  of  a small  GN2  bottle  and  a solenoid  shutoff 
valve  actuated  by  the  controller. 

The  APU  electronic  controller  provides  turbine  speed  control 
based  on  rotational  speed  sensors,  logic  for  APU  startup  and 
shutdown,  signal  conditioning,  gas  generator  catalyst  bed 
heater  control,  gearbox  make-up  pressure  control,  and  mal- 
function detection  capability  (flight  crew  alert  signals  to 
the  Orbiter  caution  and  warning  system) . Each  controller  is 
located  remotely  from  its  respective  APU.  One  is  located  in 
each  of  the  three  aft  avionics  bays. 
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The  APU  fuel  tanks  are  mounted  on  the  sidewalls  of  the  Orbiter 
aft  compartment.  Fuel  tanks  are  located  7 to  9 feet  away  from 
their  respective  APUs. 

The  exhaust  duct  assembly  directs  the  APU  exhaust  products  over- 
board through  an  exit  at  the  upper  aft  fuselage  skin.  Exhaust 
duct  assemblies  1 and  2 are  located  on  the  port  side  and  duct  3 
is  on  the  starboard  side  of  the  aft  fuselage  at  the  base  of  the 
vertical  stabilizer. 

All  APU  fluid  components  (pumps,  valves,  lines)  are  equipped 
with  thermostat-controlled  heaters  to  maintain  fluid  tempera- 
tures in  proper  ranges  during  the  APU  quiescent  period  on  orbit 
and  pre-launch.  Heaters  are  also  used  to  maintain  the  gas 
generator  bed  at  a proper  temperature  for  APU  start-up. 

The  fuel  pump  and  gas  generator  valve  modules  are  maintained 
below  200 *F  during  the  heat  soakback  period,  after  APU  shut 
down,  by  a water  spray  system  consisting  of  two  water  tanks 
and  associated  lines,  switches,  thermostats,  and  timers. 

This  system  is  only  required  on  orbit  when  convective  cooling 
is  insufficient  to  cool  these  components.  Temperatures  above 
200 *F  can  cause  partial  decomposition  of  the  hydrazine  fuel, 
with  potential  for  detonation  at  APU  start-up  if  hydrazine 
bubbles  have  not  collapsed  as  the  APU  cools  down. 

A single  water  tank  with  lines  to  all  three  APUs  is  provided 
to  cool  the  gas  generator  injector  should  an  APU  restart  be 
required  before  the  gas  generator  can  cool  naturally.  Control 
is  via  the  APU  controller.  Starting  a hot  APU  without  this 
cooling  risks  detonation  of  the  APU. 


4.4  HPU  SYSTEM  DESCRIPTION  AND  OVERVIEW 

The  Space  Shuttle  SRB  Solid  Rocket  Motor  nozzle  steering  is 
controlled  by  the  SRB  Thrust  Vector  Control  (TVC)  system. 

The  SRB  TVC  System  for  each  SRB  consists  of  two  HPUs,  two 
servoactuators , and  two  APU  control  assemblies.  The  HPUs  are 
located  on  the  SRB  aft  skirt  between  the  two  servoactuators,  as 
shown  in  Figure  4-3.  Each  HPU  is  driven  by  a hydrazine-powered 
turbine.  The  HPU  provides  hydraulic  fluid  flow  to  the  servo- 
actuator  to  obtain  the  proper  thrust  vectoring. 
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The  two  servoactuators  provide  norzle  gimbaling  in  the  SRB  rock 
and  tilt  axes  (one  dedicated  servoactuator  for  each  axis)  . 

Each  HPU  is  dedicated  to  a single  servoactuator  during  normal 
operation.  If  a single  HPU  fails,  the  remaining  unit  increases 
its  power  output  and  controls  the  nozzle  position  in  both  the 
rock  and  tilt  planes  at  slightly  reduced  gimbal  rates. 


4.5  HPU  MISSION  OPERATIONS 

The  HPUs  are  started  by  a signal  from  the  Launch  Processing 
System  (LPS)  and  operate  autonomously  through  the  SRB  boost 
phase.  The  HPUs  are  not  controlled  by  the  crew  or  ground 
command  uplink.  However,  extensive  HPU  telemetry  is  available 
to  Space  Shuttle  ground  controllers. 

In  a typical  flight,  the  four  HPUs  are  started  31  seconds 
before  lift-off  and  operate  until  HPU  power  deadfacing  at  SRB 
separation  (approximately  2 minutes  after  lift-off) . 


4.6  HPU  DESIGN  AND  FUNCTION 

The  HPU  is  very  similar  to  the  Orbiter  APU,  but  differs  in  the 
following  ways: 

a.  No  active  cooling  of  any  kind 

b.  No  external  insulation,  except  on  the  fuel  tank 

c.  No  fluid  system  heaters 

d.  Smaller  fuel  tank 

e.  Simpler  electronic  controller 

f.  No  automatic  overspeed  or  underspeed  shutdown 

g.  No  flight  crew  control  or  monitoring  interface 

h.  No  in-flight  restart  capability 

i.  Different  type  of  fuel  control  valves 

j . Different  speed  selection  scheme 

k.  One  fuel  tank  isolation  valve  rather  than  two  in  parallel 

l.  No  active  gearbox  pressurization  system 

m.  Stronger  turbine  containment  ring 

The  Hydraulic  Power  Unit  comprises  the  following  subassemblies: 

a.  Fuel  Supply  Module  (FSM) 

b.  Fuel  Isolation  Valve  (FIV) 

c.  Fuel  Pump 
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d. 

Gas  Generator  Valve  Module 

(two 

Control  Valves) 

e. 

Gas  Generator 

f . 

Turbine 

g- 

Gearbox 

h. 

Electronic  Controller 

i . 

Exhaust  Duct  Assembly 

j- 

Fuel/Lube  Oil  Seal  Cavity 

Drain 

System 

A schematic  diagram  of  the  HFU  System  is  provided  in  Figure  4-4. 

The  FSM  is  a spherical  pressure  vessel,  15  inches  in  diameter, 
which  contains  approximately  32  pounds  of  hydrazine  (N2H4) 
at  mission  start.  The  FSM  is  pressurized  with  GN2  to  deliver 
the  N2H4  to  the  HPU  fuel  pump  at  start  up.  Fuel  is  introduced 
to  the  HPU  by  electrically  commanding  the  fuel  isolation  valve 
and  the  secondary  control  valve  open.  The  GN2  pressure  provides 
start  capability  through  the  fuel  pump  bypass  valve  until  the 
pump  is  running.  The  fixed-displacement  fuel  pump,  driven  by 
the  turbine/gearbox,  provides  a constant  flow  of  hydrazine  to 
the  valve  module  after  the  initial  bootstrap  start. 

The  power  generating  portion  of  the  HPU  is  referred  to  as 
the  APU.  The  APU  consists  of  a fuel  pump,  a gas  generator  valve 
module  (which  consists  of  a primary  and  a secondary  speed 
control  valve  connected  in  series)  , a gas  generator,  a dual" 
pass  turbine,  a fixed-ratio  gearbox,  and  various  check,  service 
and  relief  valves  to  effect  control  for  the  APU. 

Turbine  speed  is  controlled  by  the  Gas  Generator  Valve  Module 
and  the  HPU  controller.  The  primary  or  modulating  valve 
downstream  of  the  pump  is  normally  open  and  allows  flow  to 
the  secondary  or  shutoff  valve.  The  secondary  valve  is  normally 
in  by-pass,  which  directs  hydrazine  flow  back  to  the  pump  inlet. 
In  the  powered  state,  it  allows  hydrazine  flow  to  the  gas 
generator.  The  HPU  controller  cycles  these  valves  to  maintain 
proper  turbine  speed. 

The  HPU  controller,  located  in  the  Aft  Integrated  Electronics 
Assembly  (IEA)  of  the  SRB,  provides  control  of  the  HPU.  The 
IEA  is  located  on  the  exterior  surface  of  the  SRB  casing, 
above  the  aft  skirt.  It  monitors  the  turbine  speed  through 
signals  received  from  two  Magnetic  Pickup  Units  (MPU)  located 
on  the  APU  turbine  shaft  and  controls  the  fuel  flow  to  the 
APU.  Fuel  flow  is  controlled  by  opening  and  closing  the  pulse 
(primary)  control  valve  and/or  the  shut  off  (secondary)  control 
valve.  Prior  to  HPU  start-up,  the  primary  valve  is  normally 
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open  and  the  secondary  valve  is  normally  closed.  The  fuel 
isolation  valve  and  the  secondary  control  valve  are  opened  at 
start-up,  allowing  pressurized  fuel  from  the  FSM  to  flow  to 
the  gas  generator.  As  the  turbine  reaches  100  percent  speed* 
(74,000  rpm)  a signal  from  the  controller  pulses  the  primary 
control  valve  to  maintain  100  percent  speed. 

A reduction  or  loss  of  primary  HPU  hydraulic  pressure  will 
cause  closure  of  a switch  in  the  associated  servoactuator 
which  will  inhibit  the  secondary  HPU  100  percent  circuit  and 
enable  its  110  percent  (79,200  rpm)  primary  valve  controller 
circuit.  This  increased  APU  speed  provides  additional 
hydraulic  flow  capacity  for  driving  two  servo-actuators. 
Restoration  of  hydraulic  pressure  in  the  failed  system  will 
move  the  servo-actuator  switching  valve  back  to  the  primary 
position  allowing  the  formerly  failed  system  to  again  supply 
hydraulic  pressure  to  its  actuator. 

The  secondary  control  valve  is  controlled  by  the  112  percent 
control  circuit.  A primary  valve-open  failure  will  cause 
the  APU  speed  to  increase.  When  the  shaft  speed  reaches  112 
percent  (80,640  rpm)  the  secondary  valve  and  control  circuit 
will  maintain  that  speed. 

The  exhaust  duct  assembly  directs  the  APU  exhaust  products 
overboard  through  an  exit  at  the  outboard  side  of  the  SRB  aft 
skirt. 
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5.0  STUDY  METHODOLOGY 


5.1  THE  PURPOSE  OF  PRA 


The  purpose  of  Probabilistic  Risk  Assessment  (PRA)  is  to  provide 
a basis  for  making  decisions.  When  PRA  is  applied  to  existing 
equipment,  like  the  Auxiliary  Power  Unit  (AFU)  and  Hydraulic  Power 
Unit  (HPU)  subsystems,  the  purpose  is  to  identify  and  evaluate  the 
risks  and  to  assure  that  any  weak  spots  are  not  overlooked.  These 
results  can  be  used  to  make  day-to-day  decisions,  for  example, 
how  to  allocate  scarce  resources,  to  improve  performance,  reduce 
cost,  or  increase  safety. 


5.2  THE  STRUCTURE  OF  A DECISION 

Like  most  other  engineered  systems,  a space  vehicle  necessarily 
involves  a degree  of  risk  in  its  operation.  Intelligent  design 
and  operating  decisions  can,  however,  control  the  amount  of  risk. 
Sometimes  it  is  possible  through  a flash  of  insight  to  change  or 
simplify  a design  in  a way  that  not  only  reduces  risk  but  also 
improves  performance  and  reduces  the  cost.  Often,  however,  risk 
reduction  involves  increased  cost  or  reduced  performance.  The 
task  of  engineering,  mission  operations,  and  program  management 
is  to  strike  an  optimal  balance  between  risk,  cost,  and  perform- 
ance. The  balance  is  struck  and  fine-tuned  through  day-by-day 
decisions,  as  the  design,  construction,  and  operation  continue. 

Ln  the  flash  of  insight  cases,  the  decisions  are  easy  to  make. 

In  the  usual  case  though,  tradeoffs  are  required.  In  these 
situations,  it  is  useful  and  necessary  to  have  quantitative 
measures  that  show  how  much  risk  is  being  weighed  against  how 
much  cost  and  performance.  These  variables  are  often  difficult 
to  analyze  and  require  complex  models  to  quantify.  Cost,  for 
example,  increases  by  redesign  but  may  be  reduced  by  future 
performance  at  reduced  risk.  All  these  variables  can  and  should 
be  quantified  for  informed  decisions  about  resource  allocation. 

Figure  5-1  shows  the  anatomy  of  a general  decision  problem. 

Each  decision  option  brings  with  it  a certain  risk,  cost,  and 
performance.  If  these  three  factors  were  precisely  known,  it 
would  be  easy  to  make  the  decision.  What  makes  the  problem 
interesting  in  real  life  is  that  these  variables  are  never 
known  with  complete  certainty.  It  is  important,  then,  to 
quantify  these  uncertainties  as  part  of  the  input  to  the 
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decision  analysis.  Figure  5-1  also  shows  the  uncertainties 
quantified  in  the  form  of  probability  curves.  Each  option  can 
be  characterized  by  a triplet  of  three  probability  curves.  The 
decision  maker  must  then  choose  which  triplet  (i.e.f  which 
, option)  he  prefers.  The  role  of  PRA,  as  shown  in  the  figure,  is 

J to  provide  the  assessment  of  risk,  including  uncertainty,  as  of 

the  input  to  decision  problems.  Strictly  speaking,  PRA  per  se 
is  limited  to  the  risk  part  of  the  problem,  but  the  same  quanti- 
tative way  of  thinking,  the  same  probabilistic  methodology,  can 
be  applied  to  the  cost  and  performance  factors  as  well. 

| Quantification  is  thus  a necessary  part  of  optimal  decision  making. 

It  also  serves  admirably  as  a discipline  for  separating  facts  and 
, evidence  from  hunches  and  wishful  thinking;  for  discriminating 

I between  information  that  is  truly  relevant  to  risk  and  that  which 

1 is  irrelevant  or  convenient  rationalization;  and  very  importantly, 

for  providing  a uniform  framework  and  language  for  documentation 
I and  communication  among  all  parties  involved  in  the  project. 
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5.3  THE  QUANTITATIVE  DEFINITION  OF  RISK 

A probabilistic  risk  assessment  of  the  APU  and  HPU  equipment  is 
fundamentally  the  same  as  a PRA  of  anything  else  since,  in  all 
cases,  we  seek  to  answer  the  same  three  basic  questions: 

a . What  can  happen ; i . e . , what  can  go  wrong? 

b.  How  likely  is  it  to  happen? 

c.  If  it  does  happen,  what  are  the  consequences? 

The  answers  can  be  grouped  as  a triplet. 


where 


<sit  L±,  x j> 


s^  * a name  and/or  description  of  the  ith  scenario;  i.e., 
an  answer  to  "what  can  happen" 

« the  likelihood  of  the  ith  scenario 

■ the  damage  state,  i.e.,  a measure  of  the  damage 
consequent  to  the  ith  scenario 

Each  such  triplet  thus  constitutes  "an"  answer  to  the  three 
questions.  The  set  of  all  possible  such  triplets  then  constitutes 
"the"  answer  to  the  questions.  This  set  may  therefore  be  adopted 
as  the  quantitative  definition  of  risk. 
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Notionally,  if  we  use  braces,  { } , to  denote  "set  of"  and  R to 
denote  "risk",  then  we  may  write 

R - { <s±f  Lif  xL>  ) . 

Applying  this  definition,  a PRA  of  the  APU  and  HPU  is  a list  of 
all  the  possible  scenarios  that  we  can  envision  originating  in 
failure  or  malfunctions  of  the  APU  or  HPU  equipment  and,  along 
with  each  scenario,  a measure  of  its  likelihood  and  its  con- 
sequences. Damage  states  (x^) , likelihoods  (L^) , and  scenarios 
(s^)  are  discussed  in  the  following  three  sections. 


5.4  THE  DAMAGE  INDEX,  X ^ 

In  the  case  of  the  APU  and  HPU,  the  damage  state,  x,  cf  most 
interest  is  Loss  of  Crew  or  Vehicle  (LOC/V) . Other  damage  states 
involved  in  this  study  include  Intact  Aborts  (IA) , entry  at  next 
Primary  Landing  Site  (PLS)  opportunity,  and  launch  delay  or  Launch 

Scrub  (LS) . 


5.5  QUANTIFYING  LIKELIHOOD:  THE  PROBABILITY  OF  FREQUENCY 

FORMAT 

To  quantify  the  notion  of  likelihood  for  APU  and  HPU  scenarios, 
we  adopt  the  "probability  of  frequency"  format.  That  is,  we 
imagine  a model  or  thought  experiment  in  which  we  have  launched 
many  millions  of  shuttles  under  varying  conditions.  At  the  end 
of  this  experiment  we  could  look  at  the  records  and  ask  "in  what 
fraction  of  missions  did  scenario  s^  occur?". 

We  shall  denote  this  fraction  by  0^,  and  call  it  the  "frequency 
of  scenario  i,  expressed  in  units  of  occurrences  per  mission. 

The  0^  are  thus  the  output  of  our  thought  experiment. 

If  we  had  actually  run  this  experiment,  we  would  know  these 
frequencies  exactly.  We  have  not  run  it  but  have,  instead,  the 
benefit  of  24  successful  shuttle  missions  and  numerous  tests. 
Thus,  we  know  something  about  these  frequencies  but  do  not  know 
them  exactly.  This  gives  rise  to  uncertainty  about  predicting 
the  likelihood  of  success  of  future  APU  and  HPU  performance. 

We  also  have  the  benefit  of  a data  base  of  APU  and  HPU  malfunc- 
tions, and  of  analytical  calculations  about  the  equipment  and 
the  consequences  of  failures.  Additionally,  we  have  the  benefit 
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of  numerous  tests  of  individual  APU  and  HPU  components,  and  the 
opinions  and  insights  of  experts  who  have  been  working  with 
Shuttle  systems  and  equipment  for  many  years.  We  have  knowledge 
of  similar  equipment  used  in  other  applications,  and  finally,  we 
have  our  general  engineering  knowledge. 

All  this  information  can  be  used  to  make  inferences  about  the 
numerical  values  of  the  frequencies,  0i.  The  format  in  which 
such  inferences  are  expressed  is  that  of  a probability 
distribution,  hence  the  name  "probability  of  frequency  format." 

Such  distributions  will  typically  have  the  appearance  of  Figure 
5-2.  We  refer  to  these  curves  as  "state  of  knowledge"  curves 
since  they  express  our  total  knowledge  (and  lack  of  knowledge) 
about  the  values  of  the  parameters  based  on  all  the  infor- 
mation sources  mentioned  above. 

These  curves  constitute  an  important  numerical  output  of  the  PRA, 
which  is  sometimes  called  a risk  profile.  They  are  one  set  of 
information  useful  for  a decision  analysis.  However,  of  equal  or 
greater  value  is  what  is  learned  in  the  process  of  arriving  at 
these  curves. 

The  discipline  and  rigor  of  getting  these  curves,  assembling 
the  information,  and  asking  the  right  questions,  produces  great 
clarity  and  communication.  It  allows  us  to  make  decisions  with 
all  of  our  knowledge  brought  to  bear,  rather  than  with  our 
knowledge  of  worst  case  scenarios  only. 

Furthermore,  the  structured,  scenario— based  methodology  allows 
us  to  determine  the  reasons  that  the  probability  distribution 
has  the  shape  that  it  does.  That  is,  it  allows  us  to  identify 
the  scenarios  and  equipment  that  contribute  to  the  risk  profile, 
and  to  rank  the  contributors  to  risk  in  order  of  importance. 


5 . 6 IDENTIFYING  SCENARIOS 

According  to  our  definition  of  risk  in  terms  of  a set  of  failure 
scenarios,  the  first  and  most  important  step  in  a risk  assessment 
is  to  identify  these  scenarios.  First,  any  scenario  that  we  can 
describe  in  a finite  number  of  words  is  actually  a category  of 
scenarios.  Thus  "the  pipe  breaks"  is  a category  that  includes 
as  subcategories,  "the  pipe  breaks  longitudinally,"  "there  is  a 
double-ended  guillotine  break,"  "the  pipe  breaks  in  such  and  such 
location,"  etc.  Our  first  principle  therefore  is  that  the  word 
"scenario"  is  taken  to  mean  "category  of  scenarios." 
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A second  point  is  that  since  our  objective  is  to  identify  all 
possible  significant  scenarios,  any  method  that  helps  us  do  that  ' 
is  good.  Any  new  way  of  looking,  any  new  way  of  categorizing 
that  helps  us  be  sure  we  have  not  overlooked  any  significant 
scenarios  is  good,  so  it  is  perfectly  all  right  to  use  more  than 
one  approach  to  scenario  identification. 

A third  point  is  that  in  any  specific  PRA  application  there  are 
likely  to  be  a huge  number  of  possible  scenarios.  Clearly  then, 
the  scenario  list  must  be  organized  in  some  way  to  allow  it  to  be 
analyzed  efficiently.  How  this  is  done  in  any  instance  is  partly 
a matter  of  personal  preference  and  partly  a matter  of  modeling 
skill.  A general  methodology  for  this  structuring  of  scenarios 
is  presented  in  Section  5.7. 

5.7  STRUCTURING  THE  SCENARIO  LIST 

To  structure  the  scenario  list  for  the  APU  and  HPU,  we  adopt  the 
following  concepts. 

a.  What  we  call  a scenario  is  by  definition  a departure  from 
the  "as  planned"  flight  of  the  vehicle. 

b.  Any  such  departure  from  plan  must  originate  in  some 
initiating  failure  as  in  Figure  5-3. 

c.  From  each  such  initiating  failure,  or  initiating  event,  a 
"tree"  of  possible  scenarios  emerges  as  shown  in  Figure  5-4. 
The  branch  points  in  this  tree  represent  further  events 
which  can  be  new  failures,  independent  of  the  initiating 
event,  or  which  can  be  dependent  or  cascade  failures.  A 
cascade  or  dependent  failure  is  one  which  happens  as  a 
consequence  of  the  original  failure. 

These  three  concepts  provide  us  with  key  ideas  for  structuring  the 
set  of  scenarios;  namely,  first  define  a finite  set  of  possible 
initiating  failure  categories  and  then  define,  from  each  initiating 
failure  category,  a finite  set  of  subsequent  failure  scenarios. 
Since  each  initiating  failure  is  a category,  just  as  each  scenario 
is  a category,  we  can  achieve  finiteness  by  judicious  definition  of 
the  categories.  The  categories  should  be  mutually  exclusive  and 
complete.  Thus,  any  actual  physical  initiating  failure  must  fall 
in  one  and  only  one  of  our  set  of  initiating  failure  categories. 
Similarly,  any  actual  emerging  scenario  must  fall  in  one  and  only 
one  of  the  finite  set  of  scenario  categories  that  we  define  for 
that  initiating  failure. 
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WHAT 

FAILED 


MULTI-EVENT  - COINCIDENTAL 
PROPAGATING  FAILURE 
PROPAGATING  AND  COINCIDENCE 


We  refer  to  the  process  of  defining  a complete  and  finite  set 
of  mutually  exclusive  categories  as  "partitioning"  the  set  of 
possible  scenarios.  Let  us  then  continue  our  line  of  thought 
by  looking  more  deeply  into  how  this  partitioning  can  be 
accomplished.  We  begin  with  the  initiating  failures. 

d.  The  initiating  failure  must  occur  in  some  part  or  subsystem 
of  the  vehicle  and  it  must  occur  at  some  time  or  during  some 
phase  of  the  mission.  Thus,  we  can  label  an  initiating 
failure  by  saying  what  happened  and  when  it  happened. 

e.  Furthermore,  by  partitioning  the  mission  time  and  the  set 
of  possible  failures  into  discrete  units  we  can  establish  a 
categorization  scheme  for  initiating  failures. 


5.7.1  Master  Logic  Diagrams 

For  the  purpose  of  the  present  study  we  have  partitioned  the 
mission  time  into  five  mission  phases:  prelaunch,  ascent,  orbit, 

entry/ landing,  and  post  wheelstop. 

To  partition  and  structure  the  set  of  possible  failures,  i.e., 
the  "what  happened"  coordinate  of  the  initiating  event,  we 
adopt  a device  called  a master  logic  diagram  (MLD) . This 
device  allows  us  to  systematically  think  out  a question  like: 
During  ascent,  how  can  LOC/V  occur?  At  the  top  level  of 
Figure  5-5,  for  example,  LOC/V  can  occur  only  if  there  is  loss 
of  thrust,  loss  of  control,  loss  of  structural  integrity,  etc. 
Thus,  at  the  second  level  we  have  partitioned  the  set  of 
possible  failures.  In  the  third  level,  each  of  these  partitions 
is  subdivided  further,  and  so  on.  The  bottom  level  provides 
failure  mode  categories  associated  with  an  APU  or  HPU. 

The  lowest  level  of  breakdown  constitutes  a complete  set  of 
discrete  initiating  failure  categories.  For  the  present  study 
we  pursue  only  those  few  of  these  categories  that  involve 
initiating  failures  in  the  APU  or  HPU  equipment. 

In  this  way,  for  example,  we  arrive  at  the  following  APU  and 
HPU  initiating  failures,  which  have  the  potential  to  lead  to 
one  of  the  damage  states: 

a.  Turbine  over speed 

b.  Fuel  (hydrazine)  leak 

c.  Exhaust  gas  leak 
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d.  Spurious  overspeed  or  underspeed  shutdown  (APU  only) 

e.  Other  failures  leading  to  permanent  shutdown  of  APU  or 
HPU 

Included  in  hydrazine  leaks  are  those  that  cause  hydrazine  to 
enter  the  aft  compartment,  go  overboard,  cr  enter  the  solenoid 
cavity  of  solenoid  valves. 

Once  the  initiating  events  are  thus  defined  the  next  step  is  to 
define  the  set  of  possible  scenarios  emanating  from  each.  For 
this  purpose  two  further  diagrammatic  devices  are  used:  Event 

Sequence  Diagrams  (ESD)  and  Event  Trees  (ET)  . 


5.7.2  Event  Sequence  Diagrams 

Event  Sequence  Diagrams  are  flow  charts  that  diagram  the  initial 
failures,  subsequent  independent  events  and  cascading  events  that 
could  occur  to  form  a scenario.  The  ESD  graphically  presents  the 
flow  of  all  reasonable  combinations  of  events;  i.e.,  all  reason- 
able scenarios.  It  associates  each  scenario  with  a damage  state. 
The  example  event  sequence  diagram  of  Figure  5-6  shows  a diagram 
of  boxes  and  lines.  The  words  in  each  box  may  be  interpreted  as 
a question  asking  if  the  event  occurs.  Horizontal  lines  leading 
from  left  to  right  between  boxes  indicate  a "yes"  answer  (Y  in 
Figure  5-6)  to  the  question.  The  next  event  to  the  right,  there- 
fore, would  follow  a successful  event  of  the  left  box.  Vertical 
lines  indicate  a "no"  answer  to  the  question.  The  next  event 
down,  therefore,  would  follow  a failure  event  in  the  top  box. 

A path  of  lines  and  boxes  from  the  initiating  failure  to  and 
including  a damage  state  is  called  a scenario. 

This  study  identified  and  structured  scenarios  that  incorporated 
three  types  of  propagating  (dependent)  failures.  The  first  type 
is  called  a "functional  interaction."  In  this  type,  the  first 
piece  of  equipment  to  fail  (e.g.,  a driver  for  the  APU  secondary 
fuel  control  valve)  causes  the  second  piece  of  equipment  to 
stop  working  (e.g.,  the  secondary  valve)  because  the  second 
piece  depends  on  the  first  piece  to  function;  i.e.,  the  driver 
provides  electric  power  that  keeps  the  secondary  valve  open. 

The  second  type  is  called  a "spatial  interaction."  In  this 
type,  a second  equipment  failure  occurs  by  virtue  of  the  first 
equipment  failure  because  of  the  spatial  proximity  of  the  two 
pieces  of  equipment.  For  example,  the  second  APU  can  fail  by 
virtue  of  a leakage  and  fire  from  another  APU.  The  third  type 
of  dependent  failure  is  called  a "common  cause"  failure.  In 
this  case,  two  or  more  pieces  of  nearly  identical  equipment 
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fail  nearly  at  the  same  time  (e.g.,  during  the  same  mission) 
because  of  an  identified  defect,  mechanism,  or  cause  common  to 
both.  For  example,  fuel  pumps  can  leak  hydrazine  in  any  or 
all  APUs  during  the  same  mission  because  the  shaft  seals  provide 
a common  weak  spot.  Such  occurrences  are  correlated  because  of 
the  single  cause  and  should  not  be  treated  as  independent, 
uncorrelated  occurrences. 


5.7.3  Event  Trees 

Although  an  event  sequence  diagram  contains  virtually  all  the 
information  needed  to  adequately  depict  scenarios,  it  is  not 
helpful  for  answering  questions  about  the  likelihood  of 
scenarios.  Xn  order  to  do  this,  an  event  sequence  diagram  is 
converted  to  an  event  tree.  As  shown  in  Figure  5-6,  the 
events  along  the  top  of  the  tree  correspond  to  the  boxes  (i.e., 
failure  categories)  in  an  event  sequence  diagram.  Sometimes 
"top  events"  represent  multiple  boxes  in  the  event 
sequence  diagram.  An  event  tree  is  amenable  to  computerized 
quantification  of  the  likelihood  of  the  scenarios.  Each  path 
from  "HL"  to  a damage  state  in  the  event  tree  of  Figure  5-6 
is  a scenario. 


Below  each  top  event  in  the  event  tree  there  are  one  or  more 
nodes,  or  branch  points.  Each  node  represents  a decision  about 
the  occurrence  or  non— occurrence  of  its  associated  top  event  in 
that  particular  scenario,  and  is  associated  with  a likelihood. 
The  likelihood  of  occurrence  of  that  top  event  in  each  scenario 


(i.e. , for  each  node  below  the  top  event)  depends  on  the  sequence 
of  events  that  come  before  in  the  scenario  — — these  likelihoods 
are  "conditional"  likelihoods.  The  likelihood  gives  the  fraction 
of  time  that  each  of  the  two  branches  at  that  node  is  followed. 

We  therefore  refer  to  the  conditional  likelihoods  of  the  nodes  of 
the  event  tree  as  "split  fractions". 


5 . 8 MULTISTAGE  MODELING 

The  operating  configuration . of  the  APU  and  the  Shuttle  changeover 
the  duration  of  a mission,  and  the  scenarios  leading  to  damage 
states  change  during  the  mission.  Before  launch  the  APU  and  HPU 
start  and  run  briefly.  Scenarios  during  this  time  would  lead  to 
launch  scrub  or,  much  less  frequently,  to  loss  of  crew  or  vehicle. 
During  ascent,  APU  and  HPU  scenarios  would  be  characterized  by 
routine  failure  and  would  lead  to  aborts,  Primary  Landing  Site 
(PLS)  , or  LOC/V.  In  orbit,  APUs  do  not  run  except  for  a brief 
period  for  Flight  Control  System  (FCS)  checkout,  scenarios  are 
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dominated  by  standby  failures  such  as  leakages,  heater  failures, 
and  thermostat  failures.  Damage  states  are  typically  PLS  entry, 
with  a remote  chance  of  LOC/V  caused  by  APU  problems.  During 
entry,  the  APUs  are  started  and  run.  During  the  lower  part  of  the 
entry,  the  flow  of  air  into  the  compartment  containing  the  APUs 
creates  a chance  of  leakage-induced  fires.  During  entry,  there- 
fore, this  additional  failure  mode  must  be  modeled  along  with 
those  failures  that  could  occur  during  ascent. 

Since  both  scenarios  and  damage  states  change  with  each  phase  of 
the  mission,  event  sequence  diagrams  are  developed  for  each 
phase.  In  some  cases,  event  trees  are  also  developed  for  each 
phase.  In  this  study  we  found  that  four  event  sequence  diagrams 
could  be  approximated  by  two  stages,  "Stage  A"  and  "Stage  B",  as 
shown  in  Figure  5-7.  The  damage  state  of  Stage  A,  which  begins 
at  APU  start  prelaunch  and  continues  through  APU  shutdown  after 
ascent,  provides  the  initial  conditions  for  Stage  B.  For 
example,  a leakage  may  occur  during  Stage  A which,  in  accordance 
with  flight  rules,  requires  that  an  APU  be  declared  lost  and  a 
PLS  entry  occur.  Stage  B begins  with  the  presumption  that  one 
APU  is  leaking  and  the  mission  time  for  which  the  scenarios  are 
quantified  is  that  of  a curtailed  mission  representative  of  a PLS 
entry  rather  than  that  of  a full  mission. 

Each  stage  is  represented  by  one  or  more  event  trees,  as 
indicated  in  Figure  5-8. 

Multistage  modeling  allows  us  to  identify  failures  that  contribute 
to  the  risk  profile  of  a particular  part  of  the  mission,  provide 
risk  profiles  for  each  stage  of  the  mission,  identify  scenarios 
that  would  span  the  entire  mission  (i.e.,  one  APU  fails  on  ascent 
and  one  APU  fails  on  descent) , and  provide  the  risk  profile  of  the 
entire  mission. 


5.9  DETERMINATION  OF  SPLIT  FRACTIONS 

Each  node  in  an  event  tree  requires  a split  fraction.  These 
split  fractions  are  determined  directly  or  by  constructing  a 
logic  model  for  the  node  to  support  development  of  the  split 
fraction.  We  use  the  probability  of  frequency  format  described 
earlier  to  express  our  state  of  knowledge  about  the  split 
fractions.  If  the  top  event  at  a node  is  simple  enough  or  if 
sufficient  data  exists  at  the  node  level,  then  the  probability 
distribution  for  the  split  fraction  is  estimated  directly.  When 
the  top  event  at  a node  represents  a complex  system,  a detailed 
model  of  the  system  is  required  to  break  the  system  down  into  its 
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Figure  5-8  RELATIONSHIP  OF  SPLIT  FRACTION  MODELS  TO  EVENT  TREES 


component  parts.  This  model  defines  how  failures  and  successes  of 
component  parts  (called  basic  events  in  the  language  of  PRA)  affect 
the  failure  and  success  of  the  top  event.  Several  traditional 
methods  are  available  for  this.  Fault  Tree  analysis  is  one  of  the 
more  prominent  ones,  and  is  the  one  used  in  this  analysis.  Figure 
5-8  indicates  this  by  pointing  out  that  a fault  tree  dealing  with 
certain  parts  of  the  APU  is  associated  with  a node. 


5.9.1  Fault  Trees 

The  basic  concept  in  fault  tree  analysis  is  to  find  out  about  a 
complex  unit,  for  which  we  have  little  information,  by  looking 
at  component  parts  about  which  we  have  much  more  information. 
Therefore,  a fault  tree  is  developed  down  to  the  level  at  which 
statistical  failure  and  success  data  may  be  used  to  obtain 
frequencies  of  the  basic  events.  Basic  events  are  denoted  by 
circles  in  Figure  5-8.  We  do  not  develop  a basic  event  for  every 
conceivable  failure  mode  at  a subcomponent  level  if  statistical 
data  exists  at  the  higher  component  level. 

The  parameters  that  we  wish  to  know  about  with  respect  to  the 
basic  event  are  called  "running  failure  rate"  and  "demand 
failure  rate."  A running  failure  rate  is  defined  as  the  number 
of  failures  of  a component  per  unit  time.  It  may  represent  a 
component  that  is  operating  or  one  on  standby.  A demand  failure 
rate  is  the  number  of  failures  of  a component  per  demand  on  it 
to  actuate,  energize,  start  or  stop.  For  example,  items  such  as 
solenoid  valves  usually  are  characterized  by  both  parameters:  a 

demand  failure  rate  when  the  valve  is  first  called  upon  to  open 
and  a running  failure  rate  as  it  operates. 

These  parameters  are  multiplied  by  either  the  duration  of 
operation  (for  running  failure  rate)  or  the  number  of  demands 
(for  demand  failure  rate)  to  obtain  an  "unavailability".  These 
unavailabilities  are  combined,  as  defined  by  the  gates  in  a 
fault  tree,  to  obtain  the  split  fraction  for  the  top  event  at 
the  node. 

In  general,  the  frequencies  of  basic  events  like  split  fractions 
are  expressed  as  probability  distributions.  These  distributions 
express  our  state  of  knowledge  about  the  frequency  of  each  basic 
event.  They  are  developed  by  applying  whatever  analysis, 
calculations,  experience,  relevant  testing,  and  engineering 
judgment  is  available. 
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5.9.2  Bayes'  Theorem 


Bayes*  Theorem  is  a fundamental  law  of  logical  inference.  It 
provides  a mechanism  for  updating  probability  distributions 
which  express  our  current  state  of  knowledge  in  order  to 
incorporate  additional  knowledge.  Thus,  if  actual  flight  data 
or  hot  fire  test  data  are  available  in  addition  to  a previously 
developed  frequency  distribution  for  a basic  event  or  a top 
event,  they  can  be  combined  by  a "statistical  inference" 
process  using  Bayes'  Theorem  (References  100  and  106).  The 
Bayesian  approach  is  capable  of  taking  into  account  both 
engineering  judgment  about  the  event  frequency  and  empirical  data 
such  as  the  actual  number  of  failures  that  were  observed 
during  operation  of  the  APU. 


5.9.3  Expert  Opinion 

Section  5.5  introduced  the  notion  that  the  format  for  quantitative 
expression  of  knowledge  is  a probability  distribution.  * Using  this 
format,  the  PRA  team's  state  of  knowledge  about  the  frequency  of 
each  event  is  expressed  as  a probability  curve.  The  curves  are 
based  on  the  total  body  of  evidence,  data,  experience,  analysis, 
and  information  that  is  available.  Included  in  this  total  body 
of  evidence  are  the  engineering  judgments  of  systems  experts. 

This  differs  from  "formal"  or  statistical  evidence,  which  is 
generally  given  in  terms  of  so  many  failures  out  of  so  many 
tries  or  hours.  Both  formal  and  informal  evidence  are  ultimately 
combined,  through  Bayes'  theorem,  to  arrive  at  the  final  state  of 
knowledge  probability  curves. 

The  question  arises  as  to  how  the  experts'  judgments  are  elicited 
and  quantified.  In  cases  where  we  have  lots  of  statistical  evi- 
dence, expert  knowledge  is  not  an  important  issue.  However,  it  is 
often  the  case  that  informal  evidence  is  a necessary  supplement 
to  sparse  data.  In  some  cases,  it  may  be  all  that  is  available. 

In  the  latter  case,  the  elicitation  and  quantification  process 
must  be  done  with  some  care  and  structuring.  The  following  five 
part  process  has  proven  effective  and  was  used  -for  this  PRA. 

a.  Motivating  the  experts  - explain  the  importance  of  the 
assessment,  its  confidentiality,  and  the  fact  that 
information  (not  commitments  or  predictions)  is  the  goal. 

b.  Structuring  the  discussion  - define  the  question  to  be 
answered  about  the  parameter  of  interest,  verify  that  the 
question  can  be  answered,  define  the  units  or  scale  for 
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answering  the  question,  and  explicitly  define  the  inherent 
assumptions  in  the  question. 

c.  Preconditioning  the  experts  - informal  discussion  of  the 
parameter  of  interest  to  detect  biases  and  induce  the 
expert  to  reveal  his  true  judgments. 

d.  Encoding  - ask  questions  to  encode  the  experts'  judgment. 

e.  Verifying  - construct  the  probability  distribution  and 
verify  that  the  experts  believe  it  is  valid. 

In  this  project,  a research  step  occurred  before  the  expert 
opinion  group  was  convened.  Written  questions  were  formulated 
during  the  evolving  scenario  identification  and  structuring  pro- 
cess. Some  of  these  questions  had  to  do  with  certain  phenomena 
initiated  by  an  APU  failure  that  could  potentially  contribute  to 
cascading  of  damage  in  the  aft  compartment.  Examples  of  these 
questions  are: 

a.  Under  what  conditions  can  hydrazine  leakage  cause  fire 
in  the  aft  compartment? 

b.  What  is  the  potential  damage  done  by  a fire? 

c.  What  are  the  conditions  leading  to  turbine  rotor  failure? 
What  is  the  energy  of  the  fragments?  What  is  the  spray 
pattern?  What  is  the  potential  for  containment? 

d.  What  damage  can  be  caused  by  uncontained  shrapnel  and 
the  accompanying  release  of  hydrazine? 

The  systems  experts  performed  the  necessary  research  and  analysis 
to  answer  these  questions.  The  answers  were  documented  to  serve 
as  a basis  for  the  development  of  conditional  probability 
distributions . 

In  preparation  for  the  meeting  to  elicit  expert  opinion,  a 
detailed  set  of  specific  scenarios  and- required  probabilities 
were  defined.  Where  possible  these  were  reviewed  by  the  systems 
experts  before  the  meeting.  The  moderator  began  the  meeting  by 
introducing  the  purpose  of  the  discussion,  methodology  of  PRA, 
and  the  role  of  the  systems  experts. 

The  moderator  then  began  the  discussion  of  the  first  scenario. 

He  made  sure  that  everyone  in  the  room  understood  this  scenario 
exactly  and  the  physical  phenomena  it  is  designed  to  represent. 
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He  made  sure  similarly  that  each  parameter  in  this  scenario 
(mainly  the  split  fractions)  was  thoroughly  understood.  Then, 
focusing  on  one  parameter  at  a time,  he  asked  the  experts  to 
discuss  the  evidence  and  attempt  to  quantify  this  evidence  in 
terms  of  probabilities  of  the  occurrence  of  the  scenario.  For 
example,  the  moderator  described  a scenario  in  which  both  control 
valves  failed  in  the  open  position  causing  a turbine  runaway  and 
turbine  disc  fragmentation.  He  then  asked  the  team  what  would 
happen  and  what  was  the  likelihood  of  the  fragments  being 
contained . 

The  object  was  to  obtain  a team  consensus.  Thus,  individual 
members  initially  proposed  different  distributions,  reflecting 
different  interpretations  and  weighings  of  the  evidence. 

However,  with  enough  discussion,  a single  distribution  was  agreed 
on  that  represented  the  team's  state  of  knowledge  as  a whole. 

It  is  the  moderator's  job,  of  course,  to  manage  this  process  so 
that  all  available  knowledge  is  incorporated  into  the  distribution. 
The  results  of  the  meeting,  the  definitions  of  the  scenarios  and  the 
parameters,  the  specific  evidence  relevant  to  each,  and  the  group's 
probability  distributions  were  documented.  This  provided  a basis 
for  reflection,  reassessment,  and  the  collection  of  new  evidence. 

The  outcome  of  this  process  was  a set  of  probability  distributions 
that  represented  the  group's  knowledge  of  the  likelihood  of  the 
spatial  interaction  split  fractions  in  the  event  trees. 


5.10  QUANTIFYING  SCENARIOS 

The  frequency  of  each  path  (scenario)  in  an  event  tree  is  obtained 
by  multiplying  the  frequency  of  the  initiating  event  (in  occur- 
rences per  mission)  by  the  "split  fractions"  at  every  node  along 
the  path. 

In  Figure  5-9,  $(I)  is  the  frequency  of  initiating  failure  I.  Out 
of  all  scenarios,  starting  at  I,  f(A|l)  is  the  fraction  in  which 
event  A happens,  given  the  initiating  failure,  I.  The  quantity, 
l~f  (A 1 1)  is  then  the  fraction  in  which  A does  not  happen. 

Our  convention  is  that  B means  "not"  B.  Out  of  all  scenarios  in 
which  I and  A happen,  f(B|lA)  is  the  fraction  in  which  event  B 
does  not  happen,  and  so  on.  Proceeding  in  this  way,  if  the  path 
S is  I A B C D,  as  shown  in  the  figure,  then  the  frequency ,0 (S) , 
of  this  path  is  given  by  the  equation  in  the  lower  left  corner  of 
Figure  5-9. 
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4>(S)  = 4»(I)f(A|I)f(B|IA)f(C|IAB)f(D|lABC) 


The  process  of  quantifying  scenario  frequencies  then  is  just  the 
numerical  evaluation  of  equations  of  this  type. 


i 


In  a multistage  model  the  scenarios  may  be  grouped  according  to 
their  damage  states.  The  <p(s) ' s are  then  added  to  yield  the 
frequency  of  each  deunage  state.  For  example,  the  ^(s)  's  of 
scenarios  that  lead  to  LOC/V  are  summed  to  give  the  total 
frequency  of  LOC/V. 

A more  accurate  estimation  of  total  LOC/V  frequency  is  obtained 
for  Stage  B in  a multistage  model  if  the  PLS  damage  state  is 
divided  into  groups  called  damage  bins.  Each  damage  bin  is 
characterized  by  a particular  kind  of  damage  to  one  or  more  APUs. 
For  example,  this  study  used  three  such  bins.  One  characterized 
by  an  APU  lost,  one  characterized  by  one  or  more  APU's  leaking, 
and  one  characterized  by  one  APU  lost  and  one  leaking.  Of  course, 
a bin  in  which  everything  is  OK  is  also  defined. 

The  frequency  of  each  bin  is  the  sum  of  the  frequencies  of  its 
constituent  scenarios.  Each  bin  serves  as  an  initial  condition 
to  the  next  stage  of  the  model. 

The  frequency  of  LOC/V  of  Stage  B for  this  study  is  then  a combi- 
nation of  the  contributions  of  the  four  bins  (three  damage  bins 
and  the  OK  bin)  that  were  the  output  of  Stage  A.  If  we  define  <*>B 
as  the  frequency  of  LOC/V  for  Stage  B and  £*(Bin  i)  as  the 
frequency  of  Bin  i from  Stage  A,  then 

4 L 

0B  (LOC/V)  = Z <pk  (Bin  i)  2 0iB  (LOC/V | Bin  i) 

i-1  j-1 

where 

$jB  (LOC/v|Bin  i)  is  the  frequency  of  scenario  j from  a total  of  L 
scenarios  that  lead  to  LOC/V,  given  Bin  i as  an  initial  condition. 

The  total  LOC/V  frequency  is  the  summation  of  (LOC/V)  and  <*>B 
(LOC/V) . 


5.11  RISK  DIAGNOSIS 

Having  assembled  the  risk  profile  per  Sections  5.1  through  5.10, 
it  remains  to  interpret  the  risk  curves  and  determine  the 
contributions  to  risk.  Figure  5-10  illustrates  this  process.  A 
similar  figure  would  apply  to  each  damage  state. 
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Figure  5-10  RISK  DIAGNOSIS 


The  LOC/V  risk  profile  itself,  (Figure  5-10a)  provides  a great 
deal  of  information.  We  know  that  the  risk  is  not  as  great  as  the 
space  to  the  right  of  the  curve  and  not  as  low  as  the  space  to  the 
left  of  the  curve.  We  expect  it  to  be  about  where  the  "hump"  of 
the  curve  is.  We  have,  therefore,  bounded  the  possibilities  and 
told  the  decision  maker  how  certain  we  are  of  the  risk. 


Furthermore  we  can  identify  the  risk  profiles  of  the  individual 
scenarios  that  are  the  most  important  contributors  to  the  total 
risk  profile  (Figure  5-10b)  . Scenarios  that  are  not  the  most 
important  contributors  to  the  total  risk  profile  should  receive 
less  priority  and  less  attention.  That  is,  scenarios  that  have 
frequencies  toward  the  left  tail  of  the  risk  profile  should  not 
receive  immediate  attention.  Identification  of  scenarios  as 
important  or  unimportant  contributors  to  a damage  state  is 
possible  because  an  event  tree  unambiguously  associates  each 
scenario  with  a damage  state. 


The  use  of  event  trees  also  allows  easy  identification  of  the  top 
events  that  contribute  to  each  high-risk  scenario  (Figure  5-10c) . 

To  find  which  components  of  the  APU  or  HPU  that  are  most  important 
to  each  top  event,  the  split  fraction  model  is  investigated  (Figure 
5-10d) . This  is  facilitated  by  a cause  table  (Figure  5-10e)  which 
delineates  in  ranked  order  from  most  frequency  to  least  frequency, 
the  individual  components  and  contributions  of  components  that 
contribute  to  the  top  event.  The  fractional  contribution  of  each 
combination  of  components  or  individual  component  to  the  top  event 
in  a particular  sequence  is  derived  from  this  table.  More  depth 
of  information  about  why  a component  has  a particular  failure  rate 
is  found  in  the  data  analysis  (Figure  5-10 f) . 

Components  of  high  ranking  that  contribute  to  important  scenarios 
should  receive  the  most  attention  for  possible  corrective  actions. 
Components  that  are  ranked  low  in  any  important  scenarios  or  do  not 
appear  in  any  important  scenarios  (no  matter  how  high  the  ranking) 
should  receive  lower  priority. 


In  this  way  the  PRA  results  help  establish  the  allocation  of 
resources  for  effective  risk  management. 


5.12  SUMMARY  OF  PRA  METHODOLOGY 

The  previous  ten  sections  discussed  the  PRA  methodology  employed 
in  the  APU/HPU  risk  assessment.  This  section  summarizes  this 
methodology  in  terms  of  a procedure  shown  in  Figure  5-11.  The  14 
steps  of  the  procedure  are  listed  as  shown. 
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Figure  5-11  PROCEDURE  FOR  APU/HPU  QUANTITATIVE  RISK  ASSESSMENT 


Step  1:  study  System 

A detailed  study  of  the  system  forms  the  basis  of  the  rest  of  the 
PRA.  This  study  includes  such  aspects  as  system  failure  modes, 
modes  of  operation,  interaction  with  the  ground  controllers,  inter- 
action with  and  dependencies  on  other  systems,  failure  history, 
maintenance,  testing,  design  changes,  refurbishment,  environmental 
conditions  when  operating  and  when  not,  and  surveillance  and 
inspection  activities. 


Step  2:  Define  Scope 

As  with  any  other  analysis  or  evaluation,  the  scope  of  effort 
(what  is  to  be  included  and  what  is  to  be  excluded)  and  the 
guiding  groundrules  and  assumptions  are  identified.  Minor 
changes  to  these  are  acceptable  as  the  project  progresses  when 
more  is  learned  about  the  system  that  is  under  assessment. 


Step  3:  Damage  State  and  Mission  Stage  Identification 

A key  element  in  defining  the  work  to  be  done  for  the  rest  of  the 
PRA  is  identifying  the  damage  states  of  interest  and  defining  the 
mission  stages  to  be  analyzed.  This  is  not  considered  part  of 
Step  2 because  considerable  technical  work  must  be  done  in  order 
to  establish  an  appropriate  definition  of  mission  stages. 


Steps  4, 

5 / 46  Scenario  Structuring 

The  development  of  initiating  failure  categories,  event  sequence 
diagrams  and  event  trees  that  identify  scenarios  leading  to  damage 
states  was  described  in  Sections  5.7  and  5.8. 


Steps 

7 8 8 Split  Fraction  Modeling 

The  use  of  fault  trees  to  model  the  top  events  and  the  develop- 
ment of  scenario-dependent  split  fraction  models  were  discussed 
in  Section  5.9. 
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Steps  9 

and  10  Data  Development 

This  study  developed  data  for  three  types  of  events.  The  first 
type  (Step  9)  was  for  random  equipment  failure  for  each  APU  (or 
HPU)  . The  second  type  (Step  9)  was  for  common  cause  failure  of 
two  APUs  (or  HPUs)  together.  The  third  type  of  data  (Step  10) 
was  for  cascading  effects  associated  with  a failure  that  by  virtue 
of  its  proximity  to  other  components  could  cause  other  components 
to  fail.  Such  events  are  called  spatial  interaction  events  in  this 
study.  They  result  from  phenonema  such  as  fires,  hydrazine  decom- 
position, hydrazine  chemical  attack,  other  chemical  reactions,  hot 
exhaust  gas,  and  shrapnel  from  turbine  rotor  failure.  Section  5.10 
described  the  methodology  of  data  development  and  of  determination 
of  the  values  of  the  split  fractions. 


Steps  ll, 

12,  & 13  Risk  Quantification 

Combining  data  with  the  model,  developing  the  split  fractions 
from  fault  trees  and  quantifying  the  multistage  event  tree  model 
was  described  in  Sections  5.8,  5.9,  and  5.10.  The  result  of  Step 
13  is  the  risk  profile  for  each  damage  state. 


Step  14  Risk  Diagnosis 

The  procedure  and  usefulness  of  disassembling  the  results  to  find 
the  constituent  contributors  to  the  risk  profile  was  described  in 
Section  5.11. 
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6.0  AUXILIARY  POWER  UNIT  (APUM  SCENARIO  PRESENTATION 


For  purposes  of  this  analysis,  APU  operations  were  divided  into 
five  mission  phases  (prelaunch,  ascent,  orbit,  entry,  and  post 
landing),  as  shown  in  Figure  6.0-1. 

The  model  was  developed  using  five  mission  phases;  however,  it 
was  concluded  that  quantification  could  be  accomplished  using 
only  two,  as  shown  in  Figure  6.0-1,  in  order  to  reduce  model 
complexity.  Stage  A extends  from  APU  prelaunch  start-up  to  APU 
post-ascent  shutdown.  Stage  B extends  from  the  end  of  Stage  A to 
APU  post  landing  shutdown.  The  periods  prior  to  APU  startup  pre- 
launch, and  after  APU  shutdown  post  landing  were  omitted  from  the 
analysis  due  to  APU  non-operation. 

In  the  subsections  below,  the  methodology  described  in  Section 
5.0  is  traced  step-by-step  through  an  analysis  of  the  APU  Sub- 
system. The  results  of  this  analysis  provide  the  framework  or 
model , which  can  then  be  evaluated  using  the  failure  frequency 
data  described  in  Section  7.0. 

Section  6.1  details  the  ultimate  damage  states  selected  for  the 
analysis.  Section  6.2  details  the  Master  Logic  Diagrams  (MLDs) 
developed  to  show  how  APU-related  initial  failure  categories 
can  lead  to  these  damage  states. 

The  event  sequence  diagrams  are  presented  in  Section  6.3.  These 
flow  diagrams  illustrating  the  scenarios  leading  to  different 
damage  states  as  a consequence  of  various  categories  of  APU  fail- 
ures. The  APU  failure  categories  and  different  damage  states 
developed  in  the  event  sequence  diagrams  provide  the  framework  for 
development  of  the  event  trees,  presented  in  Section  6.4. 

The  event  trees  establish  the  decision  points  (called  nodes)  for 
which  specific  probabilities  (called  split  fractions)  must  be 
determined  in  order  to  arrive  at  overall  probabilities  for  the 
ultimate  damage  states.  The  event  trees  are  similar  to  decision 
diagrams  — each  decision  point  must  be  answered  by  a Myes/no" 
question.  Each  path  through  the  event  tree  results  in  either  a 
damage  state  or  a state  of  no  damage,  based  on  the  cumulative 
effect  of  all  failures  in  that  path. 

Determination  of  each  event  tree  decision  point,  or  split  fraction, 
depends  on  a logical  combination  of  events,  which  is  expressed  in 
the  form  of  a fault  tree.  Development  of  these  fault  trees,  or 
split  fraction  models,  is  presented  in  Section  6.5. 
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APU  OPERATIONAL  PHASES 


l 
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Figure  6-1.  APU  Operational  Phases  and  Modeling  Stages 


Section  6.6  deals  with  the  analysis  of  a special  class  of  events 
called  Spatial  Interaction  Events  (SIEs) . These  are  events  by 
which  a failing  APU  can  cause  damage  to  other  APUs  or  to 
other  vital  equipment  in  the  Orb iter  aft  compartment.  The 
mechanisms  of  such  occurrence  might  be  shrapnel,  fire,  chemical 
attack,  and  hot  gas  impingement. 


6.1  DAMAGE  STATES 

A damage  state  is  the  outcome  of  a scenario.  A damage  state 
is  usually  an  undesired  event  selected  because  of  a need  to 
understand  its  frequency  of  occurrence. 

The  ultimate  damage  states  selected  for  this  study  were  not 
peculiar  to  the  APU  or  the  HPU  under  study,  but  were  of  a broad 
category  which  would  encompass  any  of  the  Space  Shuttle's  sub- 
systems. In  addition,  the  damage  states  were  selected  to  be 
consistent  with  the  NASA  Failure  Mode  and  Effects  Analysis  (FMEA) 
as  defined  in  NSTS  22206  (Reference  29) . The  ultimate  damage 
states  selected  were: 

a.  Loss  of  crew  and/or  vehicle 

b.  Loss  of  mission 

Loss  of  mission  implies  that  the  ability  to  perform  all  or  a 
substantial  portion  of  the  payload-related  activities  was  lost. 
However,  this  study  did  not  address  any  particular  payload. 

Loss  of  crew/vehicle  is  self-explanatory. 

These  damage  states  were  examined  for  each  of  the  five  mission 
phases  (defined  for  the  analysis  as  prelaunch,  ascent,  orbit, 
entry,  and  post  landing)  to  determine  which  damage  states  were 
applicable  during  each  of  the  phases.  The  results  are  presented 
in  Table  6.1-1. 

Loss  of  mission  was  not  judged  to  be  a viable  damage  state  for 
the  entry  and  post  landing  phases. 

Once  the  damage  states  for  the  phases  were  defined,  the  next  step 
in  the  study  was  to  develop  a set  of  Master  Logic  Diagrams  (MLDs) 
using  the  ultimate  damage  states  as  the  Top  Events.  This  process 
is  discussed  in  Subsection  6.2. 
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Table  6.1-1 


DAMAGE  STATE  APPLICABILITY 


DAMAGE 

SPACE  SHUTTLE  MISSION 

PHASE 

STATE 

PRELADNCH 

(1) 

ASCENT 

(2) 

ORBIT 

(3) 

ENTRY 

(4) 

POST 

LANDING 

(5) 

Loss  of 
Crew/ 
Vehicle 

X 

X 

X 

X 

X 

Loss  of 
Mission 

X 

X 

X 

Launch 

Scrub 

Intact 

Abort 

Enter 

ASAP 

First 

Day 

PLS 

Entry 

Next 

PLS 

Entry 

Minimum 

Duration 

Flight 

N/A 

N/A 

6.2  MASTER  LOGIC  DIAGRAM  DEVELOPMENT 


6.2.1  General  Development  Process 

With  a set  of  ultimate  damage  states  established  for  each  mission 
phase,  the  next  step  was  to  determine  if  and  how  failures  initiated 
in  the  AFU  system  could  contribute  to  these  damage  states.  The 
MLD  served  to  guide  and  document  this  thought  process.  Appendix 
B6.2  contains  the  MLDs  developed  for  this  study. 
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The  ultimate  damage  states  established  for  each  mission  phase 
represent  the  "top  events"  of  the  MLD  for  that  phase.  The  approach 
taken  was  to  develop  the  second  level  of  each  diagram  in  the  form 
of  broad  general  categories,  rather  than  immediately  focusing  on 
the  APUs.  This  "top  down"  approach  keeps  the  analyst  open  to  the 
possibility  of  unanticipated  failure  effects  involving  the  APUs. 

It  also  allows  the  diagrams  to  serve  as  a general  framework  for 
analysis  of  other  Space  Shuttle  systems.  Just  below  the  top  event 
is  the  "second  level"  which  comprises  six  general  Shuttle  functions 
whose  failure  would  cause  the  top  event.  Some  of  these  Shuttle 
functions  were  not  developed  further,  as  there  appeared  to  be  no 
relationship  between  the  APU  and  those  events. 

The  third  level  of  the  MLD  identifies  more  specific  Shuttle 
functions  that  depend,  in  part,  on  APUs.  Succeeding  levels  extend 
this  breakdown  into  ever  more  specific  Shuttle  functions,  until 
specific  APU  system  failures  begin  to  appear  in  the  diagram  at 
levels  6 and  below.  In  some  of  the  simpler  diagrams,  APU  failures 
appear  earlier. 

Many  MLDs  were  developed  that  dealt  with  physical  processes  about 
which  there  is  some  uncertainty.  These  physical  processes  are 
related,  in  some  way,  to  the  top  events.  All  such  points  of 
uncertain  dependency  were  noted,  and  documented  in  the  form  of 
technical  issues  to  be  resolved.  Completion  of  the  final 
analysis  depended  on  resolution  of  these  issues  by  the  best  means 
available.  This  involved  in-house  analysis,  a data  search  for 
technical  references,  and  reliance  on  expert  opinion. 

MLDs  can  be  developed  to  any  level  of  detail  desired,  down  to  the 
smallest,  and  seemingly  most  insignificant  part,  to  show  possible 
failure  paths  that  lead  to  the  top  event.  The  purpose  of  the 
MLDs,  however,  was  not  to  delineate  all  failure  modes  that  could 
cause  the  top  events.  Their  purpose  was  to  identify  broad 
categories  of  initial  failures,  as  discussed  in  Section  5,  from 
which  to  begin  the  the  more  detailed  identification  of  scenarios 
and  the  delineation  of  failure  modes  (in  the  fault  trees) 
associated  with  the  scenarios. 

The  completed  MLDs  served  as  a reference  for  the  next  step  in 
the  analysis,  the  development  of  Event  Sequence  Diagrams,  as 
well  as  serving  as  a continuing  reference  source  through  the 
ensuing  analysis  process.  Their  importance  in  the  PRA  process 
should  not  be  underestimated. 
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6.2.2  MLD  Descriptions 


As  a oeneral  rule,  an  MLD  was  developed  for  each  damage  state 
defined  for  each  mission  phase.  The  intact  abort  damage  state 
for  ascent  (Phase  2)  was  further  subdivided  into  specific  abort 
modes,  and  an  MLD  was  developed  for  each.  This  served  to 
clarify  the  contribution  of  APU  failures  to  ascent  abort  modes. 

The  MLDs,  as  developed,  are  provided  in  Appendix  B6.2.  They  are 
outlined  in  Table  6.2-1  and  discussed  individually  below. 


Table  6.2-1 
MLD  DEFINITIONS 


MLD 

DAMAGE  STATE 

MISSION 

PHASE 

DESCRIPTION 

1 

Loss  of  Crew/ 
Vehicle 

Phases 
1 and  2 

Prelaunch  and  Ascent 

2 

Loss  of  Mission 

Phase  2 

Return  To  Launch  Site  (RTLS) 
(Ascent) 

3 

Loss  of  Mission 

Phase  2 

Transatlantic  Abort  Landing 
(TAL)  (Ascent) 

4 

Loss  of  Mission 

Phase  2 

Abort  Once  Around  (AOA) 
(Ascent) 

5 

Loss  of  Mission 

Phase  1 

Launch  Scrub  (Prelaunch) 

6 

Loss  of  Crew/ 
Vehicle 

Phase  3 

Orbit 

7 

Loss  of  Mission 

Phase  3 

Orbit 

8 

Loss  of  Crew/ 
Vehicle 

Phases 
4 and  5 

Entry  and  Post  Landing 
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KLD  l - Loss  of  Crev/Vehiele  - Phases  1 and  2 

MLD  #1  (Appendix  B6.2-1)  depicts  how  APU  failures  can  lead  to 
loss  of  crew  and  vehicle  during  the  prelaunch  and  ascent  phases. 
The  overall  functional  effects  of  APU  failures  contributing  to 
loss  of  crew  and  vehicle  were  determined  to  fit  into  three  broad 
categories:  (1)  loss  of  thrust?  (2)  loss  cf  control?  and  (3) 

loss  of  vehicle  structural  integrity.  Of  these  three,  only  loss 
of  vehicle  structural  integrity  applies  to  the  prelaunch  time- 
frame. All  three  categories  apply  to  the  ascent  phase. 

For  the  prelaunch  phase,  loss  of  crew/vehicle  scenarios  involve 
high  high  energy  detonations  of  equipment  in  or  near  the  aft 
compartment,  such  as  the  Orbital  Maneuvering  System  (OMS)  propel- 
lant tanks.  One  source  of  such  a detonation  could  be  shrapnel 
from  an  APU  turbine  coming  apart,  or  a fire  from  an  APU  fuel  leak. 
There  may  be  other  possible  sources  of  OMS  tank  detonation,  but 
this  study  was  only  concerned  with  those  possibilities  emanating 
from  the  APU.  During  ascent,  the  high-energy  detonation  failure 
modes  still  apply,  and  other  failure  effects  leading  to  loss  of 
crew  and  vehicle  become  possible.  Included  were:  (1)  Loss  of 
multiple  hydraulic  systems  due  to  multiple  APU  failures.  This  is 
shown  in  the  MLD  as  loss  of  three  hydraulic  systems.  As  a conser- 
vative groundrule,  this  was  later  changed  to  require  loss  of  only 
two,  should  the  failure  occur  prior  to  Main  Engine  Cut  Off  (MECO) ? 
(2)  Loss  of  critical  electronics  due  to  APU  exhaust  leaks  or  due 
to  a fire  resulting  from  APU  fuel  leaks.  Fire  was  later  deter- 
mined not  to  be  credible  during  the  prelaunch  and  ascent  phases 
due  to  the  prelaunch  aft  compartment  nitrogen  purge:  (3)  Two 
engines  unable  to  throttle  up  after  the  "thrust  bucket"  due  to 
APUs  failing  during  this  critical  period.  Ascent  performance 
margins  are  also  a factor  here. 


MLD  2 - Loss  of  Mission,  RTL8  - Phase  2 

MLD  #2  (Appendix  B6.2-2)  shows  how  failures  of  one  or  more  APUs 
can  lead  to  an  RTLS  abort.  Two  scenarios  are  established.  The 
first  involves  loss  of  thrust  during  the  initial  part  of  ascent, 
within  which  an  RTLS  can  be  accomplished  (i.e.,  before  "Negative 
Return") . The  operational  effect  of  one  APU  shutting  down  during 
ascent  is  the  inability  to  change  the  thrust  level  (i.e.,  throttle 
setting)  of  a main  engine.  Should  one  APU  shut  down  during  the 
"thrust  bucket"  main  engine  throttling,  (generally  65%  of  full 
throttle) , the  reduction  of  total  thrust  available  to  the  launch 
vehicle  can  lead  to  an  RTLS  abort.  As  was  shown  in  MLD  #1,  this 
is  also  dependent  upon  vehicle  ascent  performance  margins  for  the 
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oarticular  mission  involved.  The  second  scenario  involves  the 
impending  (predicted)  loss  of  critical  systems.  In  this  case,  the 
Mission  Control  Center  (MCC)  invoices  an  RTLS  abort  in  an  effort  to 
reluin  the  vehicle  to  the  launch  area  because  of  the  impending  loss 
oft”  or  throe  hydraulic  system,  due  toimpendingfailur.s^vo 
or  three  APUs.  Examples  of  impending  failures  o g 

fuel  leaks  or  fuel  tank  pressurant  gas  leaks. 


MLD  3 - Loss  of  Mission,  TAL  - Phase  2 

This  MLD  (Appendix  B6.2-3)  is  similar  to  the  RTLS  MLD  2) 

discussed ' above , except  that  the  MCC  abort  -odeinvoked  « a 
Transatlantic  Abort  Landing.  In  the  case  of^the  stuckthvcttle 
scenario,  this  means  that  for  one  engine  si thrust  *he 

"thrust  bucket"  thrust  level  as  a result  of  an  APU  failur  , 
vehicle  performance  margins  for  this  particular  mission  allow  a 
TAL  to  be  achieved  rather  than  an  RTLS. 

in  the  case  of  the  impending  APU  failure  scenario,  more  time  is 
available  before  the  two  or  three  hydraulic  systems  are  lost  than 
was  the  case  in  MLD  #2;  i.e.,  the  leaks  are  slower,  allowing  a 
tat,  to  be  achieved  rather  than  an  RTLS  abort. 

The  TAL  abort  mode  is  considered  safer  and,  therefore , more 
desirable  than  the  RTLS  abort  mode.  However,  because  of  the 
flight  path,  the  time  before  landing  is  longer.  The  flight  rules 
call  for  invoking  the  most  desirable  abort  mode  that  the  predicte 
time  to  failure  will  allow. 


MLD  4 - Loss  of  Mission,  AOA  “ Phase  2 

The  only  viable  scenario  in  this  MLD  (Appendix  B6.2-4)  is  an  Abort 
once  Around  invoked  by  t be  MCC  to  return  the  vehicle  before  tvo  or 
three  hydraulic  systems  are  lost.  This  is  similar  to  the  TAL  and 
RTLS  scenarios  discussed  above.  However,  in  this  case,  the  impen- 
ding loss  of  the  hydraulic  systems  allows  time  for  a 90  minute  AOA 
in  preference  to  a less  desirable  TAL. 


MT.n  5 - loss  of  Mission,  Launch  Scrub  - Phase  1 

This  MLD  (Appendix  B6.2-5)  displays  ways  that  APU  failures  or 
anomalies  can  result  in  a launch  scrub  by  violating  the  Space 
Shuttle  Launch  Commit  Criteria  (LCC) . Any  of  the  APUs  can  shut 
down,  resulting  in  violation  of  the  hydraulic  pressure  criteria 
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and  an  automatic  launch  scrub.  An  APU  may  also  suffer  a 
performance  degradation,  which  violates  one  of  the  APU  performance 
redlines  and  results  in  a manual  launch  scrub.  Another  possibility 
is  excessive  use  of  APU  fuel  due  to  lengthy  launch  holds.  These 
launch  holds  could  be  caused  by  problems  with  APUs,  or  by  problems 
with  any  other  launch  vehicle  or  ground  launch  system. 


KLD  € - Loss  of  Crsv/Vehicls  - Phase  3 

Failures  on  orbit  can  lead  directly  to  loss  of  crew  and  vehicle,  or 
eliminate  a function  that  is  necessary  for  safe  entry  and  landing. 
Most  APU-caused  failures  fall  into  the  latter  category.  MLD  6 
(Appendix  B6.2-6)  shows  direct  loss  of  vehicle  resulting  from  loss 
of  control,  or  from  high-energy  detonations  during  orbit.  Also 
shown  are  failures  that  jeopardize  safe  entry  and  landing. 

Included  in  this  category  is  the  loss  of  thrust  necessary  for  the 
deorbit  bum  (branch  J is  shown  on  MLD  #8)  due  to  failures  of  the 
OMS  and  the  Reaction  Control  System  (RCS)  backward-f iring  (+X) 
jets.  The  diagram  postulates  damage  to  these  systems  due  to  APU 
hot  exhaust  leaks  or  APU  high  energy  release.  The  high  energy 
release  category  includes  energetic  shrapnel  from  the  APU  turbine 
or  gearbox.  These  apply  during  the  Flight  Control  System  (FCS) 
checkout  run  only.  It  was  later  determined  that  the  gearbox  is 
not  a credible  source  of  such  high-energy  shrapnel. 

Other  failures  that  affect  entry  and  landing  fall  under  the  cate- 
gory of  loss  of  control.  This  includes  loss  of  OMS/RCS  control 
and  loss  of  aerosurface  control.  APU  failures  that  can  lead  to 
these  conditions  include  hot  exhaust  leaks  that  can  damage 
electronics,  fluid  tanks  or  fluid  lines,  and  APU  fuel  leaks  which 
can  lead  to  fires  during  entry.  A fuel  fire  is  not  credible  on 
orbit  due  to  the  lack  of  ambient  oxygen. 

The  "loss  of  vehicle  structural  integrity"  category  postulates  an 
explosion  of  an  OMS  or  RCS  fuel  or  oxidizer  tank  due  to  APU  hot 
gas  leaks  or  shrapnel. 


MLD  7 - Loss  of  Mission  - Phase  3 

The  Loss  of  Mission  while  on  orbit  can  involve  either  a critical 
situation  requiring  entry  as  soon  as  possible,  a loss  of 
redundancy  requiring  entry  at  the  next  Primary  Landing  Site  (PLS) 
opportunity,  or  a loss  of  instrumentation  requiring  a Minimum 
Duration  Flight  (MDF) . As  can  be  seen  in  MLD  #7  (Appendix  B6.2- 
7) , various  APU  failures  can  contribute  to  these  situations. 
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The  impending  failures  that  result  in  the  need  to  enter  as  soon  as 
possible  include  impending  loss  of  all  three  hydraulic  systems  due 
to  fuel  tank  leaks  or  fuel  pressurant  gas  leaks.  The  objective  in 
this  situation  is  to  effect  a landing  before  the  malfunctioning 
systems  are  totally  lost.  The  same  type  cf  APU  failures,  if  they 
affect  only  one  or  two  hydraulic  systems,  will  result  in  a decision 
to  deorbit  at  the  next  PLS  opportunity.  This  is  to  avoid  a pro- 
longed orbit  stay  with  critical  system  redundancy  lost. 

The  declaration  of,  an  MDF  considered  here  is  the  result  of  instru- 
mentation failures  that  effect  insight  into  the  status  of  the  APUs. 
It  does  not  involve  direct  failures  cf  the  APUs  themselves. 


irr.n  s - Loss  of  Crew/Vehicle  - Phases  4 and  5 

This  MLD  (Appendix  B6.2-8)  depicts  how  APU  failures  can  lead  to 
loss  of  crew/vehicle  during  the  entry  and  post  landing  phases. 

The  overall  functional  effects  of  APU  failures  leading  to  loss  of 
crew/ vehicle  were  determined  to  fit  into  three  broad  categories: 

(1)  loss  of  OMS/RCS  deorbit  thrust?  (2)  loss  of  control;  i.e., 
OMS/RCS  control,  aerosurface  control,  or  braking/steering  rollout 
control;  and  (3)  loss  of  vehicle  structural  integrity. 

All  three  categories  apply  to  the  entry  phase.  Loss  of  thrust  no 
longer  applies  after  the  deorbit  OMS  bum,  and  loss  of  control  no 
longer  applies  after  Wheel  Stop  (WS) . After  WS  only  high  energy 
detonations  caused  by  APU-generated  shrapnel,  fire,  or  hot  exhaust 
leaks  can  lead  to  loss  of  vehicle.  APU  failures  after  shutdown 
were  not  considered  in  this  study . 

The  "loss  of  thrust"  category  of  entry  failures  is  identical  to 
that  discussed  for  MLD  #6.  It  involves  APU-caused  high  energy 
shrapnel  or  hot  gas  leaks  which  damage  the  OMS  and  RCS  systems. 

The  "Loss  of  Control"  category  is  also  similar  to  that  discussed 
under  MLD  #6,  but  with  the  additional  possibility  of  APU  fuel 
leaks  causing  destructive  fires  in  the  aft  compartment.  Other 
additions  to  this  category  include  loss  of  landing  gear  deploy 
before  touchdown,  and  loss  of  braking  and  steering  before  wheel 
stop.  This  is  assumed  to  result  in  loss  of  crew  and/or  vehicle. 
The  steering  and  braking  systems  depend  on  the  Orbiter's  hydraulic 
systems,  and  are  thus  vulnerable  to  APU  failures.  The  landing 
gear  deploy  system,  however,  has  a pyrotechnic  system  as  a backup 
to  the  hydraulic  system. 
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The  MLD  also  postulates  damage  to  the  vehicle  structure  due  to 
an  APU  fuel  tank  rupture,  or  an  OMS/RCS  fuel  or  oxidizer  tank 
explosion  caused  by  APU  shrapnel,  hot  gas  leaks,  or  fuel  fires. 


6.3  EVENT  SEQUENCE  DIAGRAMS 

Event  Sequence  Diagrams  (ESDs)  illustrate  sequences  of  events 
leading  from  initial  failure  categories,  defined  by  the  master 
logic  diagrams,  to  damage  states.  They  tell  how  an  initial  fail- 
ure (i.e.,  failure  mode)  causes  a damage  state  (an  effect).  When 
quantified  by  the  use  of  event  and  fault  trees,  the  scenarios  and 
the  events  within  the  scenarios  can  be  ranked  with  respect  to 
their  importance  to  a damage  state  such  as  loss  of  crew/vehicle. 


6.3.1  Interpretation  of  the  ESDs 

The  ESDs  were  developed  representing  five  mission  phases  in  four 
stages  as  follows: 


a.  Stage  1 represents  the  prelaunch  and  ascent  phases,  and 
includes  the  time  from  APU  start  at  TIG-5  minutes  to 
APU  shutdown  after  the  OMS-1  burn.  The  duration  of  this 
stage  was  taken  to  be  approximately  18  minutes. 

b.  Stage  2 represents  the  orbit  phase,  and  includes  the 
time  from  APU  shutdown  on  orbit  to  APU  start,  5 minutes 
before  the  deorbit  burn.  The  duration  of  this  stage 
was  assumed  to  be  about  5 days. 

c.  Stage  3 represents  entry,  descent,  and  landing  phases, 
and  includes  the  time  from  APU  start  before  the  deorbit 
bum  to  wheelstop.  The  duration  of  this  stage  was 
taken  to  be  about  50  minutes. 

d.  Stage  4 includes  the  time  from  wheelstop  to  crew  egress, 
during  about  10  minutes  of  which  the  APU  continues  to  run. 


The  ESDs  we  developed  solely  from  the  perspective  of  APU  perfor- 
mance during  the  mission.  Interfacing  systems  and  scenarios  that 
couple  performance  margins  of  other  systems  with  the  APU  were 
considered  out  of  scope.  For  example,  coupling  the  scenarios  of 
HFU  failures  with  APU  failures  was  not  attempted  in  this  study. 
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It  should  be  pointed  out  that  the  ESDs  discussed  below  model  the 
So  mission  in  four  stages,  rather  than  the  two  stages  ultimately 
employed  for  the  final  event  tree  modeling.  The  ESD  developmen 
process  provided  APU  system  insight,  which  allowed  subsequent 
model  simplification  without  significant  loss  of  modeling  accuracy. 

The  thought  process  employed  in  the  development  of  the  ESDs,  as 
discussed  below,  is  more  important  than  the  specific  model  stage 
in  which  the  scenarios  reside. 

The  boxes  in  an  ESD  ask  questions  about  the  occurrence  (or  non- 
occurrence)  of  a category  of  events.  For  example,  the  question  m 
Appendix  B6.3-1,  "Hydraulic  System  OK?",  may  ^ 

a large  number  of  questions.  Each  question  would  refer  to  a com 
ponent  in  the  hydraulic  system.  For  example,  one  might  ask  if  the 
pump  itself  is  OK.  ESDs  illustrate  the  overall  flow  of  events  that 
lead  from  an  initial  APU  failure  to  shuttle  damage  states  ** 

LOC/V  and  PLS  entry.  They  are  not  meant  to  illustrate  the  detailed 
logic  that  is  involved  in  determining  combinations  of  failure  modes 
that  lead  to  APU  failure.  This  is  achieved  in  the  split  fraction 
models  described  in  Section  6.5. 


6.3.1. 1 Interpretation  of  Initial  Failure  Categories 

The  questions  relating  to  the  initial  failure  categories  are 
found  in  the  boxes  across  the  top  of  the  ESD.  The  categories  are 
phrased  as ‘questions  such  that  a successful  event  (i.e.,  no 
initial  failure)  receives  a "yes"  answer  to  the  question  and  a 
horizontal  line  is  then  followed  to  the  next  event.  For  example, 
the  initial  failure  categories  of  equipment  failure,  turbine 
overspeed,  fuel  leakage,  and  exhaust  gas  leak  are  represented 
in  Appendix  B6.3-1  as  follows: 


a.  No  permanent  APU  failures?  (equipment  failures) 

b.  No  recoverable  APU  failures?  (equipment  failures) 
c!  Turbine  speed  control  OK?  (turbine  overspeed) 

d.  Fuel  boundary  remains  intact?  (fuel  leak) 

e.  Exhaust  gas  boundary  remains  intact?  (exhaust  gas  leak) 


The  question  "hydraulic  system  OK?"  is  also  asked,  even  though  the 
hydraulic  system  is  out  of  the  scope  of  this  PRA,  to  demonstrate 
how  an  ESD  can  diagram  the  interdependencies  between  subsystems 
and  include  sequences  of  events  that  cross  subsystem  boundaries. 


A line  pointing  downward  from  an 
initial  failure  has  occurred  (i.e 


initial  failure  category  that  an 
. , a "no"  answer  to  the  question) . 
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A sequence  of  boxes  and  lines  that  follow  the  arrows  from  initial 
failure  to  a damage  state  is  called  a scenario.  A success  of  the 
APU  occurs  when,  according  to  the  principles  of  scenario  struc- 
turing described  in  Section  5,  all  the  answers  to  the  questions 
across  the  top  (see  Appendix  B6.3-1)  are  "yes".  Since  the  boxes 
across  the  top  represent  a complete  set  of  initiating  failure 
categories,  then  in  the  absence  of  initiating  failures  the  APU 
must  have  operated  successfully.  Any  scenario  that  has  a vertical 
(down)  line  must,  therefore,  be  less  than  completely  successful. 

The  actual  "damage"  of  the  scenario  depends  on  the  number  and  type 
of  subsequent  failures  and  the  timing  of  these  failures.  The  ESD 
explicitly  shows  cascading  failures  associated  with  spatial  inter- 
actions as  well  as  functional  dependencies  and  independent  failures. 


6. 3. 1.2  Diagramming  Dependencies  in  an  ESD 

An  example  of  a functional  dependency  is  shown  in  the  sequence 
initiated  by  a failure  of  the  hydraulic  system.  The  failure  mode 
is  one  that  causes  a hydraulic  pump  seizure  before  an  underspeed 
shutdown  can  occur.  This  situation  could  potentially  be  caused 
by  a sudden  large  rupture  of  a hydraulic  fluid-  line.  Should  a 
seizure  of  the  hydraulic  pump  occur,  the  kinetic  energy  of  the 
system  could  possibly  cause  a rupture  of  the  APU  turbine  rotor. 
This  is  represented  by  the  question  "APU  turbine  intact?"  in 
Appendix  B6.3-1.  Thus  the  APU  turbine  functionally  depends  on 
avoidance  of  catastophic  hydraulic  pump  seizure.  Of  course  a 
more  obvious  functional  dependency  is  that  hydraulic  system  pump 
operation  depends  on  APU  operation. 

An  example  of  a scenario  that  includes  cascading  damage  is  shown 
if  the  APU  turbine  is  not  intact.  A negative  answer  to  the 
question  "APU  turbine  intact?"  means  that  the  turbine  rotor  has 
come  apart  and  the  pieces  have  not  been  contained  within  the 
turbine  housing.  In  that  situation,  the  APU  has  failed  and 
hydrazine  has  escaped  into  the  aft  compartment.  The  questions 
then  concern  whether  the  leak  was  isolated  (say  by  secondary  valve 
or  isolation  valve  closure) , whether  there  is  sufficient  oxygen  in 
the  aft  compartment  to  support  combustion,  and  whether  the  other 
conditions  necessary  for  a fire  are  present. 

If  a fire  cannot  occur,  the  ESD  recognizes  that  damage  in  the  aft 
compartment  may  be  caused  by  shrapnel  from  the  turbine.  Other 
causes  of  damage  in  the  aft  compartment  may  be  from  detonation  of 
an  APU  resulting  from  the  heating  effects  of  the  decomposition 
reaction  of  hydrazine  with  materials  that  act  as  a catalyst, 
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hydrazine  reaction  with  electrical  insulation  causing  open  circuits 
or  hot  shorts  in  Hf light  critical  equipment",  and  even  effects  of 
impingement  of  hot  gas  from  exhaust  duct  leakage  on  flight 
critical  equipment  or  APU  circuitry.  The  term  "flight  critical 
equipment"  is  defined  for  this  study  to  be  any  component  or  groups 
of  components  that  are  not  part  of  the  APU  or  HPU  and  whose  failure 
directly  causes  a LOC/V  in  conjunction  with  failures  in  the 
scenario.  If  a fire  can  occur,  then  it  is  also  recognized  as  a 
phenomenon  that  could  cause  the  failure  of  other  equipment  m the 
aft  compartment.  More  detailed  discussions  of  phenomena  relating 
to  cascading  damage  are  provided  in  Section  6.6. 


6. 3. 1.3  Modeling  Spatial  Interaction  Events  in  an  ESD 

Spatial  interaction  events  (SIE)  denote  potential  failures  of 
equipment  by  virtue  of  their  spatial  proximity  to  phenomena  such 
as  fires,  shrapnel,  and  hydrazine  reactions  that  tend  to  cause 
cascading  damage. 

The  spatial  interaction  phenomena  considered  in  this  study  are  as 
follows: 

a.  Hydrazine  reaction  with  materials  in  the  aft  compartment 
causing  deterioration  of  either  wire  insulation  or  other 
material  in  the  aft  compartment  following  hydrazine 
leakage. 

b.  Exothermic  hydrazine  decomposition  reaction  in  an  oxygen 
poor  environment  following  hydrazine  leakage. 

c.  Fire  in  the  aft  compartment  caused  by  hydrazine  combustion 
following  hydrazine  leakage. 

d.  Shrapnel  caused  by  turbine  rotor  failure  at  either  normal 
speed  or  turbine  runaway  conditions. 

e.  Detonations  caused  by  compression  of  hydrazine  bubbles,  leak- 
age -into  solenoid  cavities  of  the  fuel  isolation  or  control 
valves,  hydrazine  overheating  from  fires,  stuck-on  heaters, 
or  hydrazine  decomposition  reactions,  hot  restarts  without 
gas  generator  cooling,  and  APU  starts  with  gas  generator 
catalyst  bed  temperature  or  pressure  too  low. 

(f)  Leakage  of  hot  gas  into  the  aft  compartment  caused  by 
exhaust  duct  failure. 
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The  ESD  also  recognizes  that  certain  failures  may  cascade  and 
cause  other  failures.  For  example,  shrapnel  generation  and 
detonations  will  often  cause  hydrazine  leakage  into  the  aft 
compartment  which,  in  turn,  could  result  in  either  a fire  or 
decomposition  reaction  which,  in  turn,  could  cause  another 
detonation,  etc.  A more  detailed  discussion  of  the  damage 
potential  of  these  events  is  found  in  Section  6.6. 

Below  the  SIE  in  Appendix  B6.3-1  is  a triangle  with  a Greek  or 
English  character  printed  within.  This  denotes  a transfer  to 
another  place  in  the  ESD  that  has  another  triangle  with  the  same 
character  within.  The  ESD  for  spatial  interaction  events  is 
found  on  page  2 of  Appendix  B6.3-1.  This  diagram  asks  questions 
concerning  the  number  of  APUs  that  have  failed  and  whether  flight 
critical  equipment  has  failed  as  a result  of  the  phenomena 
contributing  to  spatial  interactions. 

Page  2 of  Appendix  B6.3-1  first  asks  if  spatial  interaction  has 
failed  flight  critical  equipment.  Then  it  asks  if  two  APUs  have 
failed  as  a result  of  the  initial  failure  and  the  spatial  inter- 
action. The  model  assumes  a LOC/V  if  either  occurs.  Finally, 
the  ESD  asks  if  two,  one  or  no  APUs  have  failed  as  a result  of 
the  initial  failure,  spatial  interaction,  and  potential 
independent  failure  of  another  APU. 


6. 3. 1.4  Permanent/Recoverable  Failures:  Interpreting  the 

Flight  Rules 

Page  2 of  Appendix  B6.3-1  indicates  that  the  damage  state  LOC/V 
would  occur  if  two  APUs  failed  during  ascent.  Flight  rules 
require  that  certain  APU  malfunctions  would  cause  the  Mission 
Control  Center  (MCC)  to  declare  an  APU  to  be  lost  for  the 
remainder  of  the  mission,  unless  it  was  needed  to  provide  a 
second  APU  for  landing.  These  malfunctions  are  called 
"recoverable  failures"  (RF)  to  distinguish  them  from  equipment 
failures  that  inherently  incapacitate  the  APU  in  such  a way  that 
it  cannot  be  recovered  during  the  mission.  The  latter  failures 
are  called  "permanent  failures"  (PF) . 

A fundamental  groundrule  for  this  study  was  that  permanent 
failures  of  two  APUs  any  time  during  the  mission  except  after 
wheelstop  would  be  considered  a LOC/V. 
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The  examples  given  so  far  show  how  an  ESD  diagrams  functional 
dependencies,  cascading  damage,  and  spatial  interactions. 
Independent  failures  arte  diagrammed  in  a similar  manner. 
Although  the  combination  of  two  or  more  failures  occurring 
independently  is  probably  of  lower  frequency  than  dependent 
failures,  the  ESD  recognizes  their  potential.  The  PRA  assesses 
the  frequency  of  the  scenarios  by  the  use  of  event  trees,  split 
fraction  models  and  failure  history  aata  later  in  the  study. 

Suppose,  for  example,  that  an  APU  is  declared  lost  by  flight 
rules  because  of  a spurious  shutdown.  That  same  APU  could  also 
be  leaking  hydrazine.  Appendix  B6.3-1  represents  a declared 
lost  APU  by  a vertical  line  under  the  box  with  the  question. 

-no  recoverable  APU  failures?".  The  "L"  transfer  then  leads  to 
the  next  question,  which  is  about  whether  the  hydrazine  fuel 
boundary  remains  intact.  A leakage  in  this  scenario  (one  that 
follows  a spurious  shutdown  but  with  no  other  failures)  wou 
be  a second  failure  of  the  APU,  occurring  independently;  that 
is,  not  caused  by  or  related  to  the  spurious  shutdown. 


All  scenarios  in  the  APU  ESDs  ask  if  hydrazine  leakage,  or  exhaust 
gas  leakage,  or  both  can  occur.  This  recognizes  that  virtually 
any  APU  malfunction  or  failure  can  also  be  accompanied  by  the 
initial  failure  categories  of  hydrazine  and  exhaust  gas  leakage. 


The  ESDs  account  for  the  three  APUs  in  the  orbiter  and  they 
diagram  scenarios  in  which  failures  can  occur  in  more  than  one 
APU  during  the  same  mission.  The  shadow  boxes  of  the  mitia 
failure  categories  across  the  top  of  Appendix  B6.3-1  are  the 
diagrammatic  devices  used  to  illustrate  this.  The  diagram  is 
read  left  to  right  for  each  APU. 


In  summary,  ESDs  are  capable  of  illustrating  scenarios  that 
include  failures,  malfunctions,  flight  rule  considerations, 
multiple  sub-systems,  dependent  events,  cascading  damage,  spatial 
interactions,  human  actions,  and  damage  states  for  each  stage  o 
the  mission.  The  remainder  of  Section  6.3  describes  the  events 
found  in  the  APU  ESDs  for  Stage  1,  Stage  2,  Stage  3,  and  Stage  4. 
Since,  as  discussed  above,  hydraulic  system  failures  were 
included  for  illustrative  purposes  only,  the  following  discussion 
will  not  include  hydraulic  system- initiated  scenarios. 

6.3.2  stage  1:  Prelaunch  and  Ascent  (Mission  P^tesgs  1 and  21 

The  ESD  in  Appendix  B6.3-1  covers  the  mission  between  5 minutes 
before  liftoff  when  the  APU  starts  prelaunch,  and  when  the  APU 
shuts  down  following  the  OMS-1  orbital  insertion  burn. 
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6. 3. 2.1  Scenarios  Initiated  by  Permanent  APU  Failures 

This  initiating  failure  category  includes  a number  of  failures  of 
APU  equipment  that  are  not  recoverable  during  the  mission.  These 
would  include,  for  example,  failure  to  start  the  APU,  failures  of 
pump,  valves,  turbine,  and  gearbox  to  continue  running,  lube  oil 
system  plugging,  fuel  line  plugging,  and  underspeed  shutdown,  it 
would  also  include  failure  to  successfully  shut  down  an  APU  after 
MECO.  A complete  description  of  all  initiating  failures  included 
in  the  model  of  this  category  is  presented  in  Section  6.5.2.  This 
category  does  not  include  hydrazine  leakages  to  the  aft  compart- 
ment or  into  valve  solenoid  cavities.  It  does  not  include  turbine 
runaway  events  and  events  that  would  cause  MCC  to  declare  an  APU 
lost  when  it  is  still  potentially  operable. 
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Two  specific  pieces  of  equipment,  the  gearbox  and  the  turbine, 
have  been  singled  out  for  additional  attention  in  the  diagram 
because  certain  failure  modes  of  these  components  could 
potentially  lead  to  spatial  interaction  events.  The  following 
describes  the  scenarios  in  Appendix  B6.3-5  that  are  beneath  the 
box  with  the  question:  "No  permanent  failures?". 

The  next  event  beneath  this  category  asks  if  the  gearbox  is  OK. 
This  event  includes  all  failure  modes  of  the  gearbox.  A negative 
answer  to  this  question  could  mean  that  the  gearbox  has  failed 
in  a way  that  could  cause  rapid  seizure  of  the  turbine  shaft. 
Therefore,  the  question:  "APU  turbine  remains  intact?"  is  asked. 

A negative  answer  means  that  the  gearbox  failure  may  (or  may  not) 
have  caused  an  energetic  failure  of  the  turbine  rotor  with 
subsequent  escape  of  the  pieces  from  the  APU  housing.  If  the 
gearbox  is  OK,  then  the  ESD  asks  about  independent  turbine 
failure  at  normal  turbine  speed.  If  the  APU  turbine  remains 
intact,  then  the  diagram  shows  that  a permanent  failure  (PF)  has 
occurred  and  transfers  to  questions  about  leakage. 

If  the  turbine  does  not  remain  intact,  the  same  questions  related 
to  cascading  failure  phenomena  and  spatial  interaction  events  as 
those  described  in  Sections  6. 3. 1.2  and  6. 3. 1.3  become  relevant 
in  order  to  describe  the  various  sequences  of  events  that  could 
arise  from  turbine  failure.  Tracing  through  the  ESD  from  page  1 
of  Appendix  B6.3-1  to  page  2 of  that  figure  and  B6.3-2,  the 
diagram  recognizes  that,  indeed,  further  damage  might  not  occur 
to  other  APUs  and  flight  critical  equipment,  leaving  only  the 
initial  failure  of  an  APU.  It  is  also  recognized  that  subsequent 
failures  occurring  as  a consequence  of  shrapnel  or  leaking 
hydrazine  could  lead  to  a LOC/V. 
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6. 3. 2. 2 


Scenarios  Initiated  by  Recoverable  APU  Failures 


This  initiating  failure  category  includes  those  APU  malfunctions 
that  are  included  in  the  flight  rules  as  reasons  for  MCC  to  declare 
an  APU  lost,  but  also  leaves  the  APU  potentially  operable  should  a 
second  APU  be  required  for  landing.  This  category  excludes 
hydrazine  leakages;  those  have  been  assigned  their  own  inxtial 
failure  category.  The  recoverable  failure  category  includes  the 
following  malfunctions; 


a.  Underspeed  or  overspeed  shutdowns  that  can  be  unambiguously 
identified  as  spurious.  That  is,  they  are  caused  by  electri- 
cal or  instrument  malfunction  that  causes  the  APU  controller 
to  close  the  secondary  control  valve  in  an  otherwise  success- 
fully operating  APU. 

b.  Gas  generator  bed  temperature  cannot  be  maintained  above 
70 *F  for  an  APU  start. 


c.  The  lube  oil  outlet  pressure  is  greater  than  150  psia  during 
APU  operation. 

d.  The  pressure  drop  between  the  gearbox  and  the  lube  oil 
outlet  is  less  than  20  psi  during  APU  operation. 


e.  The  lube  oil  outlet  temperature  is  greater  than  375 *F  or  the 
gearbox  bearing  temperature  is  greater  than  400 *F. 

f.  Turbine  speed  cannot  be  maintained  between  95%  and  121% 
while  running. 


g.  Gearbox  pressure  is  less  than  2 psia  before  APU  start. 

None  of  these  malfunctions  have  been  singled  out  as  a credible 
precursor  to  spatial  interaction  events;  therefore,  the  ESD 
transfers  to  questions  about  leakage. 


6. 3. 2. 3 Scenarios  Initiated  by  Turbine  Speed  Control  Failure 
Category 

This  initial  failure  category  includes  all  failures  that  cause  an 
overspeed  of  the  APU  turbine.  The  combinations  of  control  valve, 
controller,  electric  power  and  other  failures  contributing  to 
turbine  overspeed  are  in  the  split  fraction  models  described  in 
Section  6 . 5 . 2 . 1 . 
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In  general,  it  appears  that  both  the  primary  and  secondary  control 
valves  must  fail  in  the  open  position  to  cause  an  overspeed. 

Closure  of  the  isolation  valves  is  not  sufficient  to  prevent  an 
overspeed  because  enough  hydrazine  is  present  downstream  of  these 
valves  to  continue  powering  the  turbine.  It  also  appears  that  a 
single  failure  of  the  secondary  valve  stuck  in  mid  position  will 
not  cause  an  overspeed  because  most  of  the  fuel  is  directed  back 
to  the  pump  inlet.  It  was  determined  that  a failure  of  the  primary 
valve  seat  such  that  the  seat  dislodges  and  keeps  the  secondary 
valve  from  closing  is  first,  highly  unlikely,  and  second,  more 
likely  to  block  the  flow  path  than  to  cause  an  underspeed  shutdown 
than  to  cause  an  overspeed.  Therefore,  this  event  was  included  in 
the  assessment  of  fuel  line  plugging  as  part  of  the  permanent 
failure  category. 

Should  an  overspeed  condition  occur,  then  an  APU  overspeed  trip 
can  prevent  catastrophic  turbine  runaway.  This  is  questioned 
in  the  box  "overspeed  trip  avoids  runaway?”.  If  the  answer 
is  positive,  then  the  ESD  asks  about  fuel  leakages  that  are 
independent  of  the  overspeed  event.  If  overspeed  trip  is  not 
successful,  then  the  turbine  speed  would  be  expected  to  reach 
over  136,000  rpm  in  about  200  milliseconds.  At  this  speed, 
the  APU  turbine  is  unlikely  to  remain  intact.  The  expected 
event  is  that  the  turbine  rotor  would  come  apart  in  a small 
number  (e.g.,  three)  of  pieces  and  the  pieces  would  not  be 
contained  by  the  containment  ring,  nor  by  the  turbine  housing 
itself.  Shrapnel  would  enter  the  aft  compartment  accompanied 
by  hydrazine  which  would  escape  the  APU  through  the  holes 
created  by  the  pieces  of  turbine  rotor.  The  shrapnel  tends 
to  spray  in  a pattern  that  subtends  a 30*  arc  centered  on  the 
turbine  wheel  plane  of  rotation. 

Some  of  the  shrapnel  could  be  energetic  enough  to  puncture  the 
large  cryogenic  liquid  oxygen  and  liquid  hydrogen  lines  that  are 
within  the  spray  pattern  of  the  turbine  shrapnel.  If  the  outer 
shell  and  inner  lining  of  these  fuel  lines  are  punctured,  the 
results  expected  are  overpressurization  of  the  aft  compartment 
because  of  the  vaporization  process  or  the  explosive  chemical 
reaction  of  oxygen  and  hydrogen  causing  a loss  of  structural 
integrity  to  the  vehicle.  Shrapnel  could  also  be  sufficiently 
energetic  to  damage  flight  critical  electrical/  electronic 
equipment  in  the  aft  compartment,  other  compartment -mounted 
equipment,  as  well  as  the  APU  fuel  tanks.  Shrapnel  penetration 
of  the  OMS  deck  is  also  possible. 
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Hydrazine  leakage  would  not  be  expected^ to 

aft  compartment  during  ascent  because  the  compartment  P -» 
with  nitrogen  and  low  atmospheric  oxygen  conditions  qu 

attainidas  the  shuttle  gains  altitude.  However,  hydrazine  is 

electrical  insult  d or 

scenarios^ave  been  summarized  on 

Srf3?r3B^.anSo»‘^^  discussion  about  individual 
phenomena  is  presented  in  Section  6.6. 


6. 3. 2. 4 Scenarios  Initiated  by  Hydrazine  Leakage 

. . ...  . f.n u—o  cateaorv  includes  hydrazine  leakage  from  any 

-This  ^i^APU  into  the  af* ^compartment , into  the  fuel  pump  seal 
Sn°  ^*  S into  St  isolation  valve  or  control  valve  solenoid 

Stititr;£ situation  in 

^sct^fot  resultingCf toShydraz  in.  ^.  follow  a negative 

tZttiztd  on  Sg«  3 of  Appendix  B6.3-1  and  described  below. 

the  leakina  APU  has  not  itself  failed;  i.e.,  a negative 
response  to  the  question  "Leaking  APU  failed  from  other  cause.  , 
the  BSD  asks  if  Sy  other  APU  has  failed.  It  because 

flight  rules  indicate  that  different  responses  are  required 
an  APU  has  already  failed.  If  no  other  APU  has  f ailed  ( the 
expected  situation)  , then  the  question  "Leak  is 

shutdown  before  fuel  quantity  and  tank  presses  de^eted^  is 
asked.  Negative  answers  to  this  question  include  the  following 

scenarios: 

« J..V  is  not  detected  and  APU  fuel  quantity  is  depleted  or 
ZSl  pressure  drop,  below  70  psi  before  APU  is  shut  down. 

SS.  represent,  .permanent  failure  of  an  APU  and  would 
probably  release  a great  deal  of  hydrazine  into  the  aft 
compartment . 

b Leak  is  detected  but  the  leak  is  so  large  that  the  fuel 
quantity  is  depleted  or  tank  pressure  drops  below  70  psi 
before  MECO  (flight  rules  do  not  allow  an  APU  to  be  shut 
down  before  MECO  for  a fuel  leak) . This  represents  a 
permanent  failure  of  an  APU  and  would  probably  also  release 
a great  deal  of  hydrazine  into  the  aft  compartment. 
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c.  Hydrazine  leaks  into  one  of  the  valve  solenoid  cavities, 
decomposes,  causes  a pressure  increase  inside  the  valve 
and  eventually  ruptures  the  valve.  If  this  occurs  in  an 
isolation  valve,  the  entire  contents  of  the  fuel  tank  could 
be  dumped  into  the  aft  compartment.  This  would  certainly 
be  a permanent  failure  of  an  APU  with  a substantial  chance 
of  damaging  flight  critical  equipment  or  a second  APU.  If 
the  rupture  occurs  in  one  of  the  control  valves,  then  the  APU 
would  be  failed,  but  the  amount  of  hydrazine  released  into 
the  aft  compartment  would  be  limited  tinless  an  isolation 
valve  also  failed  to  close.  An  underspeed  shutdown  of  the 
APU  would  command  the  isolation  valves  to  close. 

Positive  answers  to  the  question?  "Leak  detected  and  APU  shutdown 
before  fuel  quantity  and  tank  pressure  quantity  and  tank  pressure 
depleted?"  include  the  following  scenario:  The  leak  is  detected 

and  the  APU  is  shutdown  post-KECO  with  sufficient  fuel  and  tank 
pressure  to  complete  the  mission.  In  this  situation,  the  ESD 
asks  if  the  leak  is  successfully  isolated.  This  question  refers 
to  two  situations.  A leak  downstream  of  the  isolation  valves 
will  be  isolated  only  if  both  isolation  valves  close.  A leak 
upstream  of  the  isolation  valves  cannot  be  isolated.  An  isolated 
leak  is  treated  as  a recoverable  failure  (RF) . A leak  that  cannot 
be  isolated  is  a permanent  failure.  If  no  other  APU  has  failed, 
then  flight  rules  require  that  the  APU  be  restarted  and  run  to 
fuel  depletion.  If  another  APU  has  failed,  then  this  requirement 
is  waived  so  that  the  leaking  APU  may  be  available  for  landing. 

If  the  answer  to  the  question:  "Other  APU  already  failed?"  is 

affirmative,  then  there  are  fewer  options  and  fewer  scenarios  than 
discussed  above.  In  this  situation,  there  is  one  APU  failed  and 
one  leaking.  Flight  rules  direct  either  a landing  at  the  next  PLS 
opportunity,  if  the  leaking  APU  can  support  the  required  run  time, 
or  an  intact  abort  if  the  leaking  APU  can  support  only  a limited 
duration  of  flight.  If  the  answer  to  the  question:  "Remaining 

fuel  quantity  sufficient  to  support  landing?"  is  negative,  then  a 
LOC/V  would  result. 

If  the  answer  to  the  question:  "Leaking  APU  failed  from  other 

cause?"  is  affirmative,  then  only  questions  concerning  the 
potential  of  spatial  interactions  need  to  be  asked  because  the 
leaking  APU  has  failed. 

All  leak  scenarios  shown  on  page  3 in  Appendix  B6.3-1  lead  to 
questions  about  the  potential  for  fire.  These  questions  are  asked 
to  complete  the  qualitative  development  of  scenarios  even  though 
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their  likelihood  of  occurrence  is  negligible  during  ascent  owing 
to  nitrogen  purging  of  the  aft  compartment.  After  the  questions 
concerning  fire,  the  ESD  asks  questions  about  the  spatial  inter- 
action  events  that  were  described  in  Sections  6. 3. 1.2  and  6. 3. 1.3 

above . 


6. 3. 2. 5 Scenarios  Initiated  by  Exhaust  Gas  Leakage 


This  category  includes  failures 
bine  housing  that  allow  hot  gas 


in  the  exhaust  gas  duct  or  tur- 
to  flow  into  the  aft  compartment. 


Damaae  to  APUs  and  flight  critical  equipment  may  be  caused  in 
two  ways.  First,  hot  gas  impingement  on  electronic  equipment 
may  cause  component  failures.  Second,  a very  large  leak  could 
potentially  overpressurize  the  aft  compartment  and  lead  to 
sidewall  or  bulkhead  failure  or  hydrogen  detonation.  Section 
6.6  discusses  these  phenomena  in  more  detail. 


Since  exhaust  gas  leakage  itself  does  not  inherently  cause  failure 
of  an  APU,  the  ESD  models  all  potential  scenarios  from  this 
initial  failure  category  as  spatial  interaction  events  on  page  2 
of  Appendix  B6.3-1.  These  have  been  described  in  Section  6. 3. 1.3. 


6. 3. 2. 6 Defining  the  Damage  States  for  Prelaunch  and  Ascent 

Page  4 of  Appendix  B6.3-1  is  reached  after  scenarios  for  all 
three  APUs  have  been  checked.  This  is  indicated  by  the  transfer 
triangle  the  letters  AD  within.  The  objective  of  this  part  of 
the  ESD  is  to  determine  the  appropriate  damage  state  that  should 
be  assigned  to  the  previous  sequences  of  events  covering  the 
three  APUs.  If  any  failures  occur  or  any  redlines  are  violated 
before  launch,  then  the  scenario  would  be  associated  with  a 
launch  scrub.  If  an  APU  fails  after  launch  (a  yes  answer  to 
"Has  liftoff  occurred?") , then  questions  regarding  the  time  or 
altitude  become  relevant  for  determining  the  damage  state.  If 
a failure  occurs  any  time  during  ascent  except  in  the  "thrust 
bucket",  the  appropriate  action  is  for  the  shuttle  to  continue 
to  orbit,  deploy  any  deployable  payloads,  and  enter  at  the  next 
primary  landing  site  opportunity.  This  is  termed  "PLS"  in  the 
ESD.  An  APU  failure  in  the  thrust  bucket  has  been  assumed  to 
result  in  an  intact  abort. 

Sequences  of  events  that  lead  to  one  APU  failed  and  one  impending 
APU  failure  (e.g.  leaking  hydrazine  into  the  aft  compartment) , 
are  assumed  to  lead  to  a PLS  if  the  impending  failure  is  not 
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projected  to  occur  before  vheelstop.  If,  in  the  estimation  of 
MCC,  the  impending  failure  will  not  support  a PLS,  then  an  intact 
abort  is  the  assumed  damage  state.  These  assumptions  are  consis- 
tent with  the  stated  mission  flight  rules.  The  type  of  abort 
called  by  the  MCC  depends  on  the  altitude  and  flight  performance 
margins  at  that  point  in  the  mission.  The  possible  options  are 
abort  to  orbit,  abort  once  around,  return  to  launch  site,  or 
transatlantic  abort  landing.  Of  course,  an  impending  failure 
that  will  not  support  either  a PLS  or  an  intact  abort  is  actually 
a permanent  failure  and,  when  coupled  with  another  failure,  is 
assumed  to  lead  to  a LOC/V.  A spurious  shutdown  of  an  APU  before 
MECO  was  assumed  to  have  the  same  effect  as  a permanent  failure 
when  determining  deunage  states.  If  no  APU  failures  occur  but 
instrumentation  supporting  APU  telemetry  has  failed,  then  the 
flight  rules  direct  the  MCC  to  declare  a minimum  duration  flight. 
The  success  or  failure  of  such  instrumentation  was  beyond  the 
scope  of  this  study's  quantitative  assessment. 


6.3.3  Staqq  li — Orbit  (Mission  Phase  3T 

The  ESD  presented  in  Appendix  B6.3-2  describes  APU  related 

scenarios  on  orbit  in  terms  of  three  time  intervals: 

a.  After  APU  shutdown  and  before  FCS  checkout  page  1 of 
Appendix  B6.3-2.  The  APU  is  not  operating  but  must 
perform  heating  and  cooling  functions,  and  maintain 
system  integrity. 

b.  During  FCS  checkout  (page  2 of  Appendix  B6.3-2),  one  APU 
is  min  for  about  3 to  10  minutes  in  order  to  provide 
power  to  check  out  the  hydraulically-actuated  aero- 
surfaces  in  preparation  for  entry. 

c.  After  FCS  checkout  (page  4 of  Appendix  B6.3-2),  the  APU 
is  not  operating  but  must  perform  heating  and  cooling 
functions  and  maintain  system  integrity. 


6.3.3. 1 Scenarios  Initiated  by  Failure  of  a Fuel  Isolation 

Valve  to  De-energize 

Should  a fuel  isolation  valve  fail  to  de-energize  after  APU 
shutdown,  the  crew  follows  Flight  Rule  10-11C.  If  power  is  not 
removed  from  the  valve  solenoid  within  about  20  minutes,  a local 
detonation  of  stagnant  hydrazine  may  occur  due  to  overheating. 
The  crew  restarts  the  APU  and  attempts  two  underspeed  shutdowns 
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u valves  If  the  valves  do  not  de-energize  after  the 

r.==nr:««pt.d  “;.r.p..d  .»**«.. «-  ^ * n—  « 

continue  running  until  fuel  depletion. 

The  ESD  models  this  situation  by  first  ashing:  ^^“^ea^s' 

infe  injector 

co=“S  *y«t“  "^"KS^SdetonUSuS  contact°with  the 

SeSr.=S?lein°thU ;.^cS^  SyESS^S  SoSSS 

—2/ ^rPsissSdiSS-, 

£*£  iSSTSS srtnSSdS  Be.3-lPexc.pt  that  the  potential 
for  fire  is  not  shown  for  orbit. 

. . restart  without  detonation  leads  to  the  question: 

A successful  hot  restart  wiuno  positive  answer  means 

’ ^af  SrS/is  oHnd  ^ ESD  then  asks  about  the  next  potential 

. .f:al  failure  A negative  answer  leads  to  a series  oi 
initial  failure  failures  Qf  the  APU  while  it  is  running  to 

concerning  possible  failure  allow  local  hydrazine 

fuel  depletion.  transport  afforded  by  flowing  hydrazine. 

heatup  because  aiwavs  subiect  to  the  same  initial 

However,  any  running  APU  is  alw  y 3 failure  mode  found  on 

faiir  .,srss'fS;« ™«f->  -pi***»- 

?r»imilL  ?o  th«  of* ascent  shown  on  page  1 of  Appendix  B6. 3-1. 
Should  the  APU  shutdown  at  any  time  b.fore  interaction 

sssr-i:  sr  r-ss.ss's  £ 

or  exhaust  gas  leakages. 


6.3. 3.2 


Scenarios 

Shutdown 


Initiated  by  Hydrazine  Overheating  After 


Heat  from  the  hot  40iard  the  fuel  pump  and  gas 

Generator  * valves^  ia*tharmal  conduction  because  convective  heat 
«ansfer  doei  not  occur  on  orbit  in  the  aft  compartment  and  heat 

transfer  away  from  the  APU  by  radiation  U. ^10^00...^^. 

stagnant  hydrazine  in  the  fuel  PU^P  formation  of  bubbles. 

^e° f uel ipump/GGVH*cool ing  I^em  wl. 

ZSLfZXZ  minutes°after  letdown  in  orbit. 
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The  question:  "Hydrazine  does  not  overheat  after  shutdown?"  is 

answered  affirmatively  if  the  fuel  pump/GGVM  cooling  system 
operates  successfully.  In  that  case,  the  ESD  leads  to  questions 
about  other  initial  failure  categories.  If  cooling  fails,  then 
the  question:  "Overheating  does  not  cause  detonation?"  is  asked. 

An  affirmative  answer  means  that  detonation  has  not  occurred  and 
the  failure  is  considered  recoverable. 

There  is  a possibility  of  detonation  if  the  APU  is  started  while 
hydrazine  temperatures  are  above  200*.  If  the  hydrazine  is 
allowed  to  cool  to  below  200*F  before  start,  no  detonations  are 
expected . 

An  APU  with  a failed  fuel  pump/GGVM  cooling  system  is  considered 
recoverable  if  needed  during  entry,  descent  and  landing. 

If  the  answer  to  the  question:  "Overheating  does  not  cause 
detonation"  is  negative,  then  the  spatial  interaction  questions 
are  asked  with  consideration  to  detonation,  shrapnel,  hydrazine 
decomposition,  and  chemical  attack. 


6. 3. 3. 3 Scenarios  Initiated  by  Overcooling  After  Shutdown 

Heating  of  the  APU  fuel  lines,  water  lines,  lube  oil  lines  and  gas 
generator  are  provided  during  orbit  to  maintain  hydrazine,  oil, 
and  water  above  minimum  acceptable  levels.  Gas  Generator  heating 
is  required  to  assure  an  acceptable  temperature  for  APU  startup. 
Failure  to  maintain  water  temperature  in  the  fuel  pump/GGVH  cooling 
system  above  freezing  is  considered  to  be  a failure  mode  of  the 
GGVM  and  fuel  pump  cooling  system  and  is  included  in  the  failures 
discussed  in  the  previous  section.  Flight  rules  call  for  the  APU 
to  be  considered  lost  under  the  following  conditions: 

a.  Fuel  tank  or  fuel  line  temperature  less  than  or  equal  to  35 *F 

b.  Fuel  pump  temperature  less  than  or  equal  to  35 *F 

c.  Lube  oil  temperature  less  than  or  equal  to  0*F 

d.  GGVM  temperature  less  than  or  equal  to  35 *F 

Hydrazine  freezes  at  35 *F.  If  a portion  of  the  APU  has  frozen, 
and  subsequently  heats  up,  local  uneven  thawing  could  cause  a 
line  rupture  (hydrazine  expands  when  thawing)  . Lube  oil  loses 
its  fluidity  at  0*F  and  an  APU  start  at  low  temperatures  could 
cause  gear  bearings  to  overheat.  However,  these  failures  are 
believed  to  be  recoverable  if  a second  APU  is  absolutely  needed 
to  avoid  landing  with  a single  APU.  They  are  not  considered  to 
be  causes  of  spatial  interaction  events. 


6-25 


6. 3. 3. 4 scenarios  Initiated  by  Hydrazine  Leakage  Before 
FCS  Checkout 

Fire  scenarios  are  not  relevant  for  hydrazine  leakage  on  orbit. 

The  other  hydrazine  related  phenomena  discussed  in  Sections 
6. 3. 1.2,  6. 3. 1.3,  6. 3. 2. 4,  and  6.6  are  relevant  to  orbit.  Unlike 
ascent,  however,  an  APU  with  an  isolatable  leak  could  be 
restarted  and  run  to  fuel  depletion  if  no  ocher  APUs  have  failed. 

If  leakage  occurred  and  was  detected  before  APU  shutdown  during 
ascent,  then  the  ability  to  isolate  the  leak  is  assessed  soon 
after  APU  shutdown.  If  the  leak  can  be  isolated,  and  sufficient 
fuel  and  tank  pressure  remain  to  complete  a landing,  then  the 
APU  is  considered  recoverable.  Otherwise,  the  APU  is  considered 
permanently  failed.  In  either  case,  the  APU  is  considered  lost 
and.  the  flight  rules  require  a landing  at  the  next  PLS  opportunity. 
Spatial  Interaction  event  questions  are  asked  to  complete  the 
scenario. 


If  the  leak  is  not  isolatable,  then  the  question;  "APU  fuel  quan- 
tity and  tank  pressure  can  support  start  and  landing?"  is  asked. 

A landing  at  the  next  PLS  opportunity  is  required  by  flight  rules. 
If  the  fuel  is  insufficient  and  this  is  the  first  APU  to  exhibit 
a permanent  or  recoverable  failure,  the  APU  will  be  started,  and 
run  to  fuel  depletion.  If  another  APU  has  already  been  lost,  then 
the  ESD  leads  to  the  spatial  interaction  questions  on  page  6 of 
Appendix  B6.3-2.  If  the  fuel  is  sufficient  and  another  APU  has 
been  declared . lost,  the  APU  will  not  be  restarted,  but  thermal 
conditioning  in  preparation  to  support  entry  and  landing  W1^ 
occur.  Spatial  interaction  questions  are  asked  following  all 
unisolatable  leaks. 

Unisolatable  APU  leakage  occurring  after  APU  shutdown  while  in 
orbit  is  described  in  the  ESD,  with  the  same  scenarios  as 
described  above  for  unisolatable  leaks  occurring  before  APU  shut- 
down. After  APU  shutdown,  leaks  that  occur  downstream  of  the 
isolation  valves  would  release  only  a limited  amount  of 
hydrazine.  In  fact,  the  leak  may  even  seal  itself  until  entry. 
Scenarios  initiated  by  these  isolatable  leaks  are  treated  m e 
Stage  3 ESD. 

An  APU  with  unisolatable  leaks,  and  that  cannot  support  landing  is 
restarted  and  run  to  fuel  depletion  if  no  other  APU  has  been 
declared  lost.  This  may  involve  a hot  restart  so  the  question 
"hot  restart  without  detonation?"  is  asked  at  the  bottom  of  page 
of  Appendix  B6.3-2.  This  question  involves  failure  of  the  in}ector 
cooling  system.  Failures  of  an  APU  that  would  cause  a spurious 
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start  of  the  APU  while  the  injectors  are  still  hot  are  also 
included  in  this  question  and  in  the  subsequent  questions  on  pages 
5 and  6 of  Appendix  B6.3-2. 

The  sequences  of  events  related  to  failures  during  APU  operation 
while  running  to  fuel  depletion  are  presented  on  page  5 of 
Appendix  B6.3-2.  They  are  similar  to  the  Stage  1 sequences  with 
the  following  exceptions: 

a.  Recoverable  failures  are  irrelevant. 

b.  All  sequences  lead  to  questions  concerning  spatial  inter- 
actions. The  outcome  of  spatial  interaction  questions  is 
either  LOC/V  or  one  APU  permanently  failed. 

c.  Questions  about  fires  are  not  asked,  but  the  potential  for 
hydrazine  to  remain  frozen  in  the  aft  compartment  and 
either  combust  or  decompose  to  cause  further  damage  during 
descent  is  recognized. 


6. 3. 3. 5 Scenarios  During  FCS  Checkout 

FCS  checkout  is  performed  if  no  APU  has  failed  or  been  declared 
lost  up  to  that  time  in  the  mission.  Page  2 of  Appendix  B6.3-2 
shows  the  scenarios  related  to  FCS  checkout.  Shadow  boxes  are 
not  shown  because  only  one  APU  is  used  for  FCS  checkout. 

If  the  running  APU  fails,  then  the  ESD  questions  whether  it  also 
exhibits  a leak.  An  isolated  leak  would  release  a limited  amount 
of  hydrazine  into  the  aft  compartment.  An  unisolated  leak  would 
be  a much  larger  threat  to  flight  critical  equipment  or  a second 
APU  during  descent. 

If  the  running  APU  exhibits  a recoverable  failure  or  does  not  fail 
at  all,  then  the  ESD  asks:  "fuel  boundary  remains  intact?".  Page  3 
of  Appendix  B6.3-2  shows  the  leakage  scenarios.  If  the  leak  is 
severe  enough  that  fuel  quantity  or  tank  pressure  can  no  longer 
support  a start  and  landing,  then  the  APU  is  considered  perma- 
nently failed  and  the  ESD  refers  to  possible  subsequent  failures 
associated  with  spatial  interactions.  If  the  leak  is  small  enough 
that  fuel  quantity  and  tank  pressure  remain  sufficient  after  APU 
shutdown  and  the  leak  is  isolated,  then  the  failure  is  recoverable 
and  spatial  interaction  questions  are  asked.  If  the  leak  is  not 
isolated,  the  APU  is  restarted  and  run  to  fuel  depletion. 

Questions  about  hot  restart  without  detonation  and  subsequent 
potential  spatial  interactions  are  then  asked. 
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6. 3. 3. 6 Scenarios  Following  FCS  Checkout 

Th«  APC  used  for  FCS  checkout  Bust  successfully  shut  down,  cool 
and  maintain  fluid  system  temperatures  above  minimums.  The 
t°”;  Iras  must  continue  to  maintain  temperatures  above  minimums 
before  and  during  FCS  checkout.  Scenarios  associated  with  these 
factions  are  shown  on  page  4 of  Appendix  BS  3-1.  and  are  essen- 
tially  identical  to  those  shown  on  page  1 cf  that  ESD. 

6. 3. 3. 7 Defining  Damage  States  for  Orbit 

A negative  response  to  the  question  on  page  1 of  Appendix  B6.3-2, 
"no  APUs  failed  or  declared  lost  by  mission  rules?  indicates 
that  FCS  checkout  will  not  be  performed  and  the  Potion  £h® 

ESD  labeled  "deorbit  discriminator"  is  entered.  The  deorb.t 
discriminator  is  also  entered  from  page  4 of  Appendix  B6.3  2 
after  scenarios  that  deal  with  the  post-FCS  ch*c*°^ 

The  deorbit  discriminator  is  found  on  page  7 of  ®6*3  2' 

and  defines  the  damage  states  for  each  scenario  in  Stage  2.  If 
one  APU  is  lost  either  permanently  or  by  flight  rules,  a landing 
at  the  next  PLS  opportunity  is  assumed.  If  two  APUs  are  perma- 
“enSy  “lied,  a iSc/V  is  assumed.  If  all  APUs  are  OK  but  the 
MCC  loses  the  ability  to  monitor  APU  status  more  than  72  hours 
prior  to  deorbit,  then  a minimum  duration  flight  is  declared.  If 
?oss  of  ability  to  monitor  APU  status  occurs  within  72  hours  of 
orbit,  the  mission  proceeds  normally.  Otherwise,  all  three  APUs 
are  considered  OK  to  support  entry. 

6.3.4  Stage  3;  Entry.  Descent.  Landing  to  Wfrqejgtop  (MlsglQB 
Phase  43 

Appendix  B6.3-3  describes  scenarios  associated  with  the  time 
interval  from  APU  start  at  deorbit  TIG-5  minutes  to  wheelstop. 

The  scenarios  are  presented  in  terms  of  failures  to  start  the 
APUs  after  orbit  (page  1 of  Appendix  B6.3-3)  and  failures  during 
APU  operation  (pages  2 through  5 of  Appendix  B6.3-3) . 


6. 3. 4.1 


Scenarios  Involving  Readiness  of  APUs  to  Start 


These  scenarios  arise  largely  from  flight  rules  10”23'  10“2*' 
and  10-28,  and  from  the  Entry  Checklist  (JSC-18540).  Norma  y, 
one  APU  will  be  started  at  deorbit  TIG-5  and  th* 
at  13  minutes  before  Entry  Interface  (EI-13) . Flig 
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however,  provide  for  different  start  times  for  various  AFU 
failures.  These  are  summarized  as  notes  1,  2,  and  3 on  page 
1 of  Appendix  B6.3-3.  The  following  discussion  applies  to 
page  1 of  Appendix  B6.3-3.  An  affirmative  answer  to  the 
question;  "2  or  3 APUs  ready  for  start?"  means  that  the  flight 
rule-enforced  delays  apply  to  no  more  than  one  APU.  The  ESD 
then  asks  if  the  first  attempted  start  of  an  APU  at  TI6-5  is 
successful.  If  it  is,  then  the  ESD  asks  if  at  least  one  other 
is  ready  to  start  at  EI-13 . The  start  failures  at  EI-13  are 
modeled  as  part  of  the  questions  on  pages  2 and  3 of  the  ESD. 

A negative  answer  to  the  question  "2  or  3 APUs  ready  for  start?" 
implies  that  either  all  APUs  are  delayed  due  to  flight  rules 
or  two  APUs  are  delayed.  A delay  of  all  APUs  is  indicated  by 
a negative  answer  to  the  question;  "1  APU  ready  for  start?". 

In  this  situation,  the  ESD  asks  if  APUs  are  ready  to  start  at 
EI-13.  It  is  assumed  that  at  least  two  APUs  would  be  started 
before  TAEM  to  support  landing.  A positive  answer  to  the 
question  "1  APU  ready  for  start?"  is  followed  by  a question 
about  whether  the  APU  is  successfully  started  using  all  avail- 
able start  techniques. 

An  affirmative  answer  to  the  question;  "2  or  3 APUs  ready  for 
start?"  is  followed  by  the  question  of  whether  the  first  APU  to 
attempt  start  does  so  successfully.  If  the  APU  starts,  then  the 
other  two  start  attempts  are  made  at  EI-13.  If  the  first  APU  to 
attempt  starting  fails,  then  the  ESD  asks  if  the  second  APU  to 
attempt  start  does  so  successfully.  If  this  one  also  fails  to 
start,  then  alternate  start  techniques  are  employed  in  an  attempt 
to  provide  at  least  one  operable  APU  before  the  deorbit  bum.  If 
both  APUs  still  do  not  start,  the  ESD  points  out  that  flight  rules 
recommend  a one  orbit  delay  to  decide  on  a work-around.  Flight 
rules  do  not  provide  guidance  on  the  course  of  action  to  be  taken 
if  a one  orbit  delay  fails  to  provide  a work-around.  Therefore, 
the  ESD  conservatively  assumes  that  a LOC/V  would  result  if  a 
work-around  cannot  be  found  for  at  least  one  APU.  If  one  APU  is 
successfully  started  and  one  has  failed,  the  ESD  recognizes  that 
the  running  APU  would  operate  with  a depressurized  hydraulic 
system  until  EI-13. 

This  diagram  and  the  accompanying  notes  1,2,  and  3 model  the  number 
of  APUs  that  have  successfully  started  at  TIG-5  and  the  number  to 
be  started  at  EI-13  or  Terminal  Area  Energy  Management  (TAEM) . The 
start  and  run  failure- initiated  scenarios  are  presented  on  pages  2 
through  5 of  Appendix  B6.3-3  and  described  below. 
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6. 3. 4. 2 Start  and  Run  Failure  Scenarios 

The  initial  failure  categories  for  Stage  3 are  identical  to  those 
of  Stage  1.  With  the  exception  of  the  hydrazine  leakage  initial 
failure  category,  the  subsequent  scenarios  are  also  essentially 
identical.  These  have  been  described  in  Sections  6. 3. 2.1, 

6. 3. 2. 2,  6. 3. 2. 3,  and  6. 3. 2. 5.  The  scenarios  during  this  stage 
are  influenced,  however,  by  flight  rules  that  do  not  apply  to 
ascent  or  orbit.  For  example,  if  one  APU  has  failed  before  or 
fails  during  descent,  the  remaining  two  APUs  will  operate  at  high 
speed  starting  at  TAEM  and  automatic  shutdown  will  be  inhibited 
during  the  remainder  of  descent  and  landing.  Furthermore,  hot 
restarts  will  be  attempted  during  descent  to  assure  two  APUs 
operating  before  TAEM.  This  consideration  is  shown  on  page  2 of 
Appendix  B6.3-3.  Should  the  answer  to  the  question  "2  or  more  APUs 
operate  OK?"  be  negative,  then  the  questions:  "start  recoverable 
APU  before  TAEM  ?"  and  "recovered  APU  runs  OK?"  are  asked.  A 
negative  response  to  either  question  would  result  in  a LOC/V 
according  to  the  groundrules  of  this  study . 

Hydrazine  leakage  scenarios  are  described  below  and  are  presented 
on  page  3 of  Appendix  B6.3-3. 


6. 3. 4. 3 Hydrazine  Leakage  Scenarios  in  Stage  3 

This  initial  failure  category  includes  hydrazine  leakage  from  any 
part  of  the  APU  into  the  aft  compartment,  the  fuel  pump  seal  drain 
line,  and  the  isolation  valve  or  control  valve  solenoid  cavities. 
The  situation  in  which  hydrazine  contaminates  and  causes  blockage 
of  lube  oil  flow  is  included  within  the  permanent  failure  category. 
The  scenarios  also  include  the  situation  in  which  a leak  may  have 
developed  on  orbit  but  is  not  detected  until  entry . Such 
situations  are  modeled  as  a leak  that  is  detected  before  blackout. 
Scenarios  resulting  from  hydrazine  leakage  follow  a negative 
answer  to  the  question ; " fuel  boundary  remains  intact? " . 

Many  leakage  locations  allow  hydrazine  to  be  released  into  the  aft 
compartment  during  entry.  The  potential  for  fire  as  the  shuttle 
descends  becomes  quite  an  important  consideration  for  determining 
the  consequences  of  hydrazine  leakage.  Furthermore,  certain 
materials  in  the  aft  compartment  such  as  Kapton  electrical  wire 
insulation  are  vulnerable  to  chemical  attack  by  hydrazine. 

If  the  leaking  APU  has  not  itself  failed;  i.e.,  a negative 
response  to  the  question  "leaking  APU  failed  from  other  cause?", 
the  ESD  asks  if  the  leak  was  detected  before  blackout.  If  so,  and 
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there  are  no  previous  AFU  failures,  then  the  flight  rules  indicate 
that  the  APU  would  be  shut  down.  If  the  leak  is  isolated  and  the 
remaining  fuel  quantity  and  tank  pressure  are  sufficient  to 
support  landing,  then  the  APU  is  potentially  recoverable  at  TAEM. 
Recovery  would  be  attempted,  however,  only  if  a second  APU  is 
needed  for  landing. 

A negative  response  to  the  question  "fuel  quantity  and  tank 
pressure  sufficient  to  support  landing?",  includes  the  following 
situations : 


a.  A severe  leak  such  that  insufficient  fuel  remains  to  support 
landing. 

b.  Hydrazine  leaks  into  one  of  the  solenoid  cavities,  decom- 
poses, causes  a pressure  increase  inside  the  valve,  and 
eventually  ruptures  the  valve.  If  this  occurs  in  an  isola- 
tion valve,  the  entire  contents  of  the  fuel  tank  could  be 
dumped  into  the  compartment.  This  would  certainly  be  a 
permanent  failure  of  an  APU,  with  a substantial  chance  of 
damaging  flight  critical  equipment  or  a second  APU.  If  the 
rupture  occurs  in  one  of  the  control  valves,  then  the  APU 
would  be  failed,  but  the  amount  of  hydrazine  released  would 
be  limited  unless  an  isolation  valve  also  failed  to  close. 

An  underspeed  shutdown  of  the  APU  would  command  the 
isolation  valves  to  close. 


If  a leak  *is  not  isolated  by  shutting  down  the  APU  and  the 
remaining  fuel  quantity  and  tank  pressure  are  judged  by  MCC  to 
be  insufficient  to  support  landing,  then  the  APU  would  be  hot 
restarted  and  run  to  fuel  depletion.  The  potential  for  detonation 
exists  if  the  injector  cooling  fails  or  a spurious  APU  start 
occurs  without  sufficient  injector  cooling.  If  the  APU  cannot  be 
restarted,  it  is  considered  to  be  permanently  failed.  Running  an 
APU  with  an  unisolatable  leak  to  fuel  depletion  limits  the  amount 
of  hydrazine  available  to  cause  damage  in  the  aft  compartment. 
Therefore,  the  inability  to  do  this  results  in  a higher  potential 
for  loss  of  a second  APU  or  flight  critical  equipment. 

An  APU  with  an  unisolatable  leak  that  is  judged  able  to  support 
landing  is  not  required  to  be  restarted.  Since  it  appears  that 
relatively  small  leaks  can  allow  enough  hydrazine  accumulation 
in  the  aft  compartment  to  cause  a damaging  fire,  this  course  of 
action  increases  the  chance  of  loss  of  flight  critical  equipment 
or  additional  APUs.  Flight  rules  indicate  that  any  time  an 
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unisolatable  leak  causes  the  tank  pressure  to  reach  the  minimum 
start  pressure  (100  psia)  , the  APU  is  to  be  started  so  that  1 
is  available  to  support  landing. 

The  ESD  shows  different  scenarios  for  the  situation  in  which  a 
leak  is  detected  before  blackout  but  an  APU  has  previously  failed 
or  been  declared  lost.  The  leaking  APU  would  not  be  shut  down. 

The  rationale  given  in  the  flight  rules  is  to  avoid  risking  a 
start  failure  and,  thereby,  having  to  land  with  a single  APU.  Even 
So^h  toe  chance  of  fire  might  be  greater  than  if  the  APU  is  .hut 
down  the  flight  rules  indicate  that  this  is  preferable  to 
chance  of  failing  to  start  to.  leaking  APU  cndattempting  a 
landing  with  only  one  operating  APU.  The  r's"^s  of  this  study 
(see  Section  8)  suggest  that  the  conditional  probability  of  a fire 
toH  damages  a second  APU  or  flight  critical  equipment,  given  a 
leak  is  far  greater  than  the  probability  of  failing  to  start  an 
iro  ' Therefore,  this  flight  rule  may,  in  fact,  increase  the  risk 
of  LOC/V  in  the  situation  of  one  APU  lost  and  one  leaking. 

Leaks  that  occur  at  lower  altitudes  and  after  blackout  are  treated 
differently  by  flight  rules  than  those  that  occur  before  blackout. 
Leaks^rom  the  seal  cavity  with  no  previous  APU  failures  require 
that  the  leaking  APU  be  shut  down.  If  an  APU  has  previously 
failed,  then  the  leaking  APU  would  not  be  shut  down.  Leaks  into 
the  aft  compartment  do  not  require  the  APU  to  be  shut  down.  If 
£e  answer  to  the  question  -leaking  APU  failed  from  other  cause?" 
is  affirmative,  then  only  questions  concerning  the  potential 
fire  and  other  spatial  interactions  need  to  be  asked. 

All  leak  scenarios  shown  on  page  3 of  Appendix  B6.3-3  lea(* 
questions  about  the  potential  for  fire  in  the  aft  compartment. 
After  the  questions  concerning  fire,  the  ESD  asks  questions  about 
the  spatial  interactive  events.  These  are  shown  on  page  4 of 
Appendix  B6.3-3  and  are  identical  to  the  questions  asked  on  page  2 
of  Appendix  B6.3-1  and  described  in  Sections  6. 3. 1.2  and  6. 3. 1.3. 


6. 3. 4. 4 Defining  Damage  States  for  Stage  3 

The  damage  states  relevant  for  Stage  3 are  LOC/V  and  OK.  A 
scenario's  damage  state  depend,  on  to.  number  of  APUs  fail.d  and 
toe  timing  of  those  failures.  Page  5 of  Appendix  B6.3-3  diagrams 
the  logic  used  to  define  the  damage  states. 

Two  APUs  lost  before  touchdown  is  considered  by  the  model  to 
result  in  a LOC/V.  If  only  APU  number  1 is  lost,  then  nosewheel 
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steering  is  lost,  but  the  crew  can  successfully  steer  by 
differential  braking.  If  no  more  than  one  APU  is  lost,  the  model 
results  in  a successful  mission. 

If  APU  number  3 and  APU  number  1 are  lost  or  if  APU  number  3 and 
APU  number  2 are  lost  before  vheelstop,  the  model  assumes  a 
successful  mission  with  one  half  normal  braking  power.  If  all 
three  APUs  are  lost  before  vheelstop  but  after  touchdown,  the 
model  assumes  a LOC/V  caused  by  inability  to  brake  and  steer. 


6.3.5  Stage  4:  Wheelstoc  to  Crew  Egress  (Mission  Phase  5) 

The  APU  normally  runs  for  a short  time  (approximately  10  to  20 
minutes)  after  vheelstop.  However,  if  an  APU  is  leaking  the  APUs 
are  shutdown  as  soon  as  possible  after  vheelstop.  The  crew 
remains  with  the  vehicle  for  up  to  about  40  minutes  after  APU 
shutdown.  Appendix  B6.3-4  shows  the  ESD  for  this  stage.  Only 
those  scenarios  that  can  cause  a catastrophic  event  such  that  the 
Orbiter  explodes  or  is  consumed  by  fire  are  of  concern  in  this 
stage.  Failures  of  APUs  cannot  cause  loss  of  crew  or  vehicle, 
unless  the  failures  lead  to  such  a catastrophic  event.  This  stage, 
therefore,  is  included  for  illustration  only.  Quantification  of 
such  scenarios  is  beyond  the  scope  of  this  study.  The  initial 
failure  categories  are  shown  across  the  top  of  Appendix  B6.3-4. 
They  are  as  follows: 


a.  Failure  of  APU  turbine  to  remain  intact  --  this  includes  all 
failures  that  could  generate  shrapnel  from  the  APU  turbine. 

b.  Leakage  of  hydrazine  during  and  after  APU  shutdown  — this 
includes  all  hydrazine  leaks  that  could  potentially  lead  to 
catastrophic  fire. 

c.  Exhaust  gas  leaks  — this  includes  all  large  exhaust  gas 
leaks  that  could  potentially  cause  overheating  and 
detonation  of  hydrazine  within  an  APU. 

d.  Hydrazine  overheating  after  APU  shutdown  — this  includes 
scenarios  in  which  leakage  causes  a fire  which,  in  turn, 
causes  a detonation  and  events,  such  as  failure  to  deenergize 
an  isolation  valve,  that  lead  to  a detonation  of  hydrazine 
without  previous  hydrazine  leakage. 


6-33 


Following  failure  of  an  APU  turbine  to  remain  intact,  the  ESD 
questions  whether  shrspn.l  is  contained , whether  a tire  occurs  and 
whether  either  one  could  cause  catastophic  Orbiter  damage.  A yes 
to  the  last  question  results  in  a LOC/V.  Otherwise,  the  APUs  are 
considered  to  have  completed  their  mission.  Examp  es  o scenarios 
that  would  be  catastrophic  are: 


a.  Explosion  of  fuel/oxidizer  in  the  OHS  or  RCS  propellant  tanks 
after  being  punctured  by  shrapnel 

b.  Detonation  of  hydrazine  in  the  APU  fuel  tanks  leading  to  a 
fire  that  destroys  the  aft  fuselage 

c.  Fire  caused  by  leaking  hydrazine  that  overheats  the  fuel/ 
oxidizer  in  the  OMS  or  RCS  propellant  tanks 


Following  leakage  of  hydrazine,  the  ESD  questions  the  occurrence 
of  a fire  and  whether  the  fire  causes  catastrophic  damage. 

Following  an  axhaust  gas  leak,  the  ESD  questions  if  the  hot  gas 
caused  a detonation,  whether  the  detonation  resulted  in  a fire, 
and  whether  the  fire  caused  catastrophic  damage. 

Following  overheating  and  detonation  of  hydrazine  after  APU  shut- 
down, the  ESD  questions  if  the  detonation  and  fuel  leak  resulted 
in  a fire,  and  if  the  fire  caused  catastrophic  damage. 


6.3.6  Summary 

Section  6.3  has  discussed  the  event  sequence 

develop  and  illustrate  scenarios  that  begin  with  initial  failure 
of  the  APO  and  eventually  lead  to  one  of  five  damage  states.  The 
damage  states  are  OK,  launch  scrub,  intact  abort,  enter  at  next  ^ 
PLs  opportunity,  and  LOC/V.  A typical  Shuttle  mission  was  divided 
into  four  stages  for  the  purpose  of  modeling  with  ESDs.  The 
modeling  stages  are  prelaunch  and  ascent,  orbit,  entry  throug 
wheelstop,  and  wheelstop  through  crew  egress. 

Although  ESDs  are  useful  for  the  development  and  communication  of 
scenarios,  they  are  not  adequate  for  quantifying  the  risk  of  the 
APU.  Event  trees  and  split  fraction  models  are  used  for  thi 
are  discussed  in  the  next  two  sections. 
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6.4  APU  EVENT  TREE' DEVELOPMENT 

The  ESDs  presented  in  the  previous  section  were  developed  to 
clearly  describe  the  sequential  flow  of  events  for  AFU-initiated 
scenarios  that  could  lead  to  LOC/V,  launch  scrub,  intact  abort, 
land  at  next  primary  landing  site  opportunity,  or  a successful 
mission. 

Event  trees  were  developed  from  the  ESDs  to  facilitate  quantifi- 
cation because  established  computer  programs  were  available  for 
obtaining  frequencies  of  scenarios  expressed  in  the  form  of 
event  trees.  Because  quantification  is  the  goal  of  an  event 
tree,  the  top  events  need  not  have  a one-to-one  correspondence 
with  the  boxes  in  the  event  sequence  diagrams,  and  the  top 
events  need  not  be  shown  from  left  to  right  in  their  expected 
order  of  occurrence.  Instead,  the  top  events  can  represent  an 
individual  box  in  an  ESD,  a group  of  boxes  in  an  ESD,  or  a 
breakdown  of  an  individual  box.  The  order  of  the  event  tree 
top  events  was  established  to  best  capture  the  inter-event 
dependencies  and  facilitate  the  development  of  scenario-dependent 
split  fractions. 

The  construction  of  event  trees,  particularly  in  a multi-stage 
model  as  described  in  Section  5,  depends  on  the  analysts'  skill 
and  experience,  knowledge  of  the  data,  and  knowledge  of  the  split 
fraction  models.  The  objective  is  to  best  utilize  the  available 
data  to  obtain  an  accurate  estimate  of  the  frequency  of  each 
scenario. 


6.4.1  Two-Stace  Event  Tree  Model 

Section  6.3  includes  descriptions  of  the  potential  scenarios 
during  the  time  frame  from  5 minutes  before  launch  (i.e.,  APU 
start)  to  APU  shutdown  after  wheelstop.  It  was  found  that  two 
event  tree  stages,  called  Stage  A and  Stage  B,  could  adequately 
serve  as  a framework  for  quantification  of  these  scenarios. 

Stage  A served  as  a quantitative  framework  for  those  scenarios 
characteristic  of  the  time  from  5 minutes  before  launch  to  APU 
shutdown  after  the  OMS-1  orbit  insertion  burn.  This  event  tree 
includes  start  failures,  failures  to  continue  running  after 
start,  recoverable  failures,  and  failures  to  successfully  close 
the  fuel  tank  isolation  valves  upon  APU  shutdown. 

Stage  B served  as  a quantitative  framework  for  those  scenarios 
characteristic  of  orbit,  entry,  and  landing  through  APU  shutdown. 
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It  includes  start  failures,  failures  to  continue  running  after 
start,  recoverable  failures,  and  attempts  to  recover  A ®* 
Combining  the  ESDs  from  orbit,  entry/ landing,  and  post  wheelstop 
does  not  compromise  the  accuracy  of  the  estimates  of  the  damage, 
state  fractions.  The  ability  to  identify  whether  certain  failures 
occurred  in  orbit  or  during  entry/landing  is  lost.  However,  for 
the  purposes  of  this  study,  this  is  not  considered  to  be  a 
significant  loss. 


The  quantification  of  Stage  A results  in  determination  of  the 
fraction  of  ascents  that  end  in  each  damage  state.  The  quanti- 
fication of  Stage  B results  in  determination  of  the  fraction  of 
flights  that  end  in  each  damage  state. 


The  Stage  A Event  Tree  (Appendix  B6.4-1)  consists  of  the  initial 
event,  which  is  the  attempted  start  of  the  APUs  in  the  Orbiter, 
followed  by  21  top  events,  and  ends  with  the  damage  state  of  each 
sequence.  The  damage  state  is  shown  in  Appendix  B6. 4-1  as  an  "x" 
below  one  of  the  following:  loss  of  crew  or  vehicle  (LV)  , launch 

scrub  (LS) , intact  abort  (IA) , or  land  at  next  primary  landing 
site  (PLS) . Also  shown  is  a summary  of  the  number  of  APUs  leaking 
(the  number  below  NL) , the  number  of  spurious  shutdowns  (the 
number  below  NS)  , the  number  of  permanent  failures  (the  number 
below  NF) , and  whether  the  scenario  must  be  continued  in  the  next 
stage  (an  X under  EL) . Taken  together,  a line  of  Xs  and  numbers 
at  the  end  of  a sequence  in  the  event  tree  is  called  a damage 
vector.  Each  sequence  is  associated  with  a damage  vector.  Two 
or  more  sequences  may  have  the  same  damage  vector.  A transfer  m 
the  tree  (e.g.,  XFR1)  means  that  the  dotted  line  is  to  be 
replaced  by  a previously  defined  group  of  sequences.  For 
example,  the  dotted  lines  that  end  with  XFR1  is  to  be  replaced  by 
the  group  of  sequences  and  associated  damage  vectors  to  the  right 
of  the  "XI"  mark  beneath  top  event  "BA".  A transfer  is  not  used 
unless  both  the  sequence  of  events  and  the  associated  damage 
vector  are  appropriate  to  replace  the  dotted  line.  A legen  is 
provided  on  the  first  page  of  Appendix  B6.4-1,  Appendix  B6.4-2, 
and  Appendix  B6.4-3,  that  describes  the  top  event  designators  and 
damage  state  designators  for  Stage  A and  Stage  B Event  Trees. 


In  the  general  case  of  a two-stage  model,  each  damage  vector 
serves  not  only  as  the  end  state  of  Stage  A but  as  an  initial 
condition  of  Stage  B.  An  initial  condition  defines  the  failures 
that  begin  each  Stage  B quantification.  In  general,  a Stage  B 
event  tree  must  be  quantified  for  each  Stage  A damage  vector. 

The  fraction  of  each  Stage  A damage  vector  (which  is  the  same  as 
the  fraction  of  its  associated  sequence)  serves  as  the  frequency 
of  the  initial  event  for  Stage  B.  That  is,  the  fraction  of 
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missions  ending  in  each  Stage  B sequence  is  multiplied  by  the 
same  factor,  namely,  the  damage  state  fraction  that  serves  as 
the  initial  condition  for  the  event  tree. 

• 

In  most  applications  of  a two-stage  model,  and  this  was  no 
exception,  many  damage  vectors  have  nearly  the  same  impact  on  the 
Stage  B model.  Many  of  the  damage  vectors  lead  to  quantification 
of  Stage  B with  essentially  the  same  initial  failures.  This 
suggests  that  many  damage  vectors  can  be  grouped  together  in  what 
are  called  "damage  bins"  and  the  frequencies  of  the  grouped  damage 
vectors  can  be  summed  to  obtain  the  total  damage  bin  frequency.  The 
Stage  B Event  Tree,  therefore,  need  only  be  quantified  for  each 
damage  bin  rather  than  for  each  damage  vector. 

The  damage  bin  is  characterized  by  a set  of  failures  that  serves 
as  initial  conditions  for  Stage  B and  by  a fraction  of  ascents 
that  lead  to  the  particular  bin.  Having  accepted  the  notion  of 
"binning",  it  was  also  recognized  that  certain  damage  vectors  have 
low  frequency  of  occurrence  and  may  conservatively  be  represented 
by  a damage  bin  with  a much  larger  frequency  of  occurrence.  In 
this  case  the  word  "conservative"  means  that  the  status  of  the  APUs 
as  characterized  by  the  damage  bin  is  worse  than  the  low  frequency 
damage  vector  that  it  is  grouped  with.  In  this  application  the 
following  damage  bins  have  been  defined. 

a.  All  damage  vectors  with  an  "x"  under  LV  were  grouped  into  a 
bin  for  loss  of  crew  or  vehicle. 

b.  All  damage  vectors  with  an  "x"  under  LS  were  grouped  into  a 
bin  for  launch  scrub. 

c.  All  damage  vectors  with  an  "x"  under  IA  were  grouped  into  a 
bin  for  intact  abort. 

All  damage  vectors  with  an  "x"  under  PL  were  grouped  into  three 
bins  representing  three  groups  of  APU  damage  having  similar 
effects  on  the  ability  to  land  at  the  next  primary  landing  site 
opportunity.  These  three  bins  were  as  follows: 

d.  Damage  vectors  with  one  APU  lost 

e.  Damage  vectors  with  one  APU  leaking 

f.  Deunage  vectors  with  one  APU  lost  and  one  APU  leaking 

All  damage  vectors  with  no  failures  were  grouped  into  an  "OK"  bin. 
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The  first  three  bins  above  need  not  serve  as  initial  conditions 
for  Stage  B because  loss  of  crew  or  vehicle,  launch  scrub,  and 
intact  abort  are  the  end  states  of  interest.  It  is  also  of 
•interest,  however,  to  assess  the  chance  that  Stage  A sequences 
which  have  been  declared  as  PLS  or  were  OK  end  in  loss  cf  crew  or 
vehicle.  Therefore,  the  last  four  damage  bins  (three  for  PLS  and 
one  for  OK)  serve  as  initial  conditions  for  Stage  B.  Scenarios 
exhibiting  spurious  shutdowns  were  grouped  with  bins  4,  5,  or  6 
depending  on  the  scenario.  A detailed  description  of  the  binning 
logic  is  shown  in  T&bl©  6. 4.1. 

Damage  bin  number  7 served  as  the  initial  condition  for  the  Stage 
B Event  Tree  called  Stage  B7.  Damage  bin  number  4 served  as  the 
initial  condition  for  the  Stage  B Event  Tree  called  Stage  B4. 

The  Stage  B7  Event  Tree  is  shown  in  Appendix  6.4-2  and  the  Stage 
B4  Event  Tree  in  Appendix  6.4-3.  These  illustrate  how  the 
initial  conditions  affected  the  number  and  variety  of  sequences 
during  Stage  B.  Only  two  damage  bins  were  required  to  define 
the  end  states  of  Stage  B.  These  were  loss  of  crew  or  vehicle 
(LV)  and  OK. 


6.4.2  Stage  A Event  Tree 

The  Stage  A Event  Tree  is  shown  in  Appendix  6.4-1.  It  models 
the  time  period  from  APU  start  before  launch  to  APU  shutdown 
after  the  OMS-1  orbital  insertion  bum. 


6. 4. 2.1  Relationship  of  ESD  to  Stage  A Event  Tree 

Table  6.4.2  presents  a summary  description  of  each  top  event  in 
the  Stage  A Event  Tree  (refer  to  Appendix  B6.4-1  for  the  event 
tree  itself).  Table  6.4.3  relates  each  top  event  in  the  Stage  A 
Event  Tree  to  one  or  more  ESD  questions. 


6. 4. 2. 2 Construction  of  the  Stage  A Event  Tree 

The  assumptions,  groundrules  and  approximations  used  to  construct 
the  tree  were  as  follows: 

a.  APU  failure  was  defined  as  the  inability  to  power  its  assoc- 
iated hydraulic  pump  to  the  extent  necessary  to  maintain 
adequate  hydraulic  pressure  at  the  expected  hydraulic  demand. 

b.  Two  APU  failures  lead  to  loss  of  crew  or  vehicle  (LV) . 
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DAMAGE  BIN  ASSIGNMENTS- STAGE  A 


Notes : 

1.  Three  spurious  shutdowns  or  one  permanent  failure 
and  two  spurious  shutdowns  were  conservatively 
assumed  to  be  LOC/V 


XXX 


TABLE  6.4.2 


TOP 

Event 


IE 

HY 

TA 

PA 

DA 

CA 

HA 

GA 

LI 

L2 

L3 

FA 

Cl 

C2 


EVENT  DEFINITIONS  — APO  EVENT  TREE  - STAGE  A* 


Definition 


Demand  for  APU  Start 
Hydraulic  System  Failure** 

Turbine  Overspeed 

Equipment  Failure  of  One  APU  After  it  Starts 

Failure  of  the  Second  APU  After  it  Starts 

Failure  of  the  Second  APU  or  Failure  of  Fliqht 
Critical  Equipment  Owing  to  Spatial  Interactions 
Initiated  by  Failure  of  the  First  APU 

Failure  of  One  APU  Owing  to  Exhaust  Gas  Leak 

Failure  of  Flight  Critical  Equipment  or  the 
Second  APU  Owing  to  Exhaust  Gas  Leak 

Leakage  of  Hydrazine  From  APU  1 

Leakage  of  Hydrazine  From  APU  2 

Leakage  of  Hydrazine  From  APU  3 

Failure  of  Flight  Critical  Equipment  or  Two  APUs 
Owing  to  Spatial  Interactions  Initiated  by 
Hydrazine  Leakage 

Hydrazine  Leakage  Causes  Failure  of  APU  1 Given 
That  Two  APUS  Have  Not  Failed 

Hydrazine  Leakage  Causes  Failure  of  APU  2 Given 
That  Two  APUs  Have  Not  Failed 
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TABLE  6.4.2  (Concluded) 


TOP  EVENT  DEFINITIONS  — APD  EVENT  TREE  - STAGE  A 


Event 

Definition 

C3 

Hydrazine  Leakage  Causes  Failure  of  APU  3 
That  Two  APUs  Have  Not  Failed 

Given 

SI 

Spurious  Shutdown  of  APU  1 

S2 

Spurious  Shutdown  of  APU  2 

S3 

Spurious  Shutdown  of  APU  3 

BA 

Failure  of  One  or  Two  APUs  Upon  Start  or 
Running  Before  Launch 

While 

EA 

Failure  Occurs  in  the  Thrust  Bucket  . 

MA 

Failure  Occurs  After  MECO 

IA 

Intact  Abort  Called  by  MCC 

Stage  A Event  Tree  is  Shown  in  Appendix  B6.4-1. 

This  top  event  is  included  to  show  how  an  event 
tree  can  include  scenarios  that  cross  subsystem 
boundaries.  Quantitative  evaluation  of  the 
hydraulic  system  is  out-of-scope. 
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TABLE  6.4.3 


RELATIONSHIP  OP  STAGE  A EVENT 
TREE  TOP  EVENTS  TO  APU  BSD  1 - PRELAONCH  AND  ASCENT* 


Event 


Questions  from  Appendix  B6.3— 1 6 Table  6.4.2 


HY 

TA,  DA 
PA,  DA 


CA 


HA,  GA 


"Hydraulic  System  OK"  and  All  Eoxes  Beneath  that  Question 

"Turbine  Speed  Control  OK"  and  All  Boxes  Beneath  that 
Question 

"No  Permanent  APU  Failures  " and  All  Boxes  Beneath  that 
Question 

This  event  also  includes  the  question  "Fuel  Isolation 
Valves  Close  Within  10  Minutes  After  APU  Shutdown"  and 
All  Boxes  Beneath  it  in  Appendix  B6.3-2 

All  questions  following  "SIE".  They  include:  "SIE  Does 
Not  Fail  Flight  Critical  Equipment" 

"SIE  and  Initial  Failure  Does  Net  Cause  Two  APUs  to  Fail" 

"SIE  and  Initial  Failure  Does  Not  Cause  the  Second  APU 
to  Fail  With  One  Already  Failed"  the  Above  Questions 
Relate  to  Spatial  Interactions  that  Follow  Failures 
Involving  Shrapnel. 

"Exhaust  Gas  Boundary  Remains  Intact"  and  All  Spatial 
Interaction  Questions  Beneath  It.  The  Spatial 
Interaction  Questions  Now  Refer  Only  to  the  Damage 
Potentially  Caused  by  Exhaust  Gas  Release. 


LI,  L2 , "Fuel  Boundaries  Remain  Intact" 
L3 


FA  "Sufficient  Oxygen  for  Fire  in  Aft  Compartment" 

"Fire  in  Aft  Compartment"  and  All  Questions  Following 
"SIE" . The  Spatial  Interaction  Questions  Now  Refer  to 
the  Damage  of  Flight  Critical  Equipment  or  APUs 
Potentially  Caused  by  Hydrazine  in  the  Aft  Compartment. 

Cl,  C2 , "Remaining  Fuel  Quantity  Sufficient  to  Support  Landing" 
C3 

"Leak  Isolated" 


6-42 


TABLE  6.4.3  (Concluded) 


Event 


SI,  S2, 
S3 

BA 

HA 

EA 

IA 


Questions  from  Appendix  B6.3-1  & Table  6.4.2 


•'Leak  Detected  and  APU  Shutdown  Before  Fuel  Quantity 
and  Tank  Pressure  Depleted" 

"Sufficient  Oxygen  For  Fire  in  Aft  Compartment" 

"Fire  in  Aft  Compartment" 

All  Questions  Following  "SIE".  These  Spatial  Interaction 
Questions  Now  Refer  to  Damage  of  an  Individual  APU 
Potentially  Caused  by  Hydrazine  in  the  Aft  Compartment 

"No  Recoverable  Failures" 

Spurious  Shutdowns  and  Isolatable  Leaks  Were  Modeled  as 
Recoverable  Failures 

The  Question  "Has  Liftoff  Occurred"  and  Questions  Below  It 

This  Top  Event  Determines  the  Fraction  of  Each  Scenario 
That  Occurs  Before  or  After  Launch.  It  is  Used  to 
Decide  on  Whether  the  Scenario  Ends  in  Launch  Scrub  or 
LOC/V. 

This  Top  Event  Does  Not  Appear  on  an  ESD.  It  Was  Added 
to  the  Event  Tree  to  Distinguish  Failures  After  MECO 
That  Would  Not  Contribute  to  Intact  Aborts. 

"Has  Thrust  Bucket  Started?" 

"Has  Thrust  Bucket  Ended?" 

"Second  APU/Hydraulics  Loss  Impending" 

"Will  Failing  APU/Hydraulics  Not  Support  PLS?" 

"Will  Failing  APU/Hydraulics  Not  Support  Intact  Abort?" 


Stage  A Event  Tree  is  Shown  in  Appendix  B6.4-1. 
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All  failures  except  leakage  and  spurious  shutdown  have  been 
modeled  as  permanent  or  nonrecoverable. 

The  event  tree  was  quantified  from  APU  start  (Liftoff  minus 
5 minutes)  to  APU  shutdown  on  orbit.  Failure  of  a fuel  tank 
isolation  valve  to  close  upon  attempted  shutdown  was 
conservatively  modeled  as  a permanent  failure. 

A large  hydrazine  leak  was  defined  as  a leak  for  which  the  APU 
would  deplete  all  usable  fuel  before  the  end  of  the  flight. 


Any  modeled  failure  of  any  APU  that  occurred  before  launch 
was  assumed  to  lead  to  launch  scrub,  with  one  exception. 
Shrapnel  and  hydrazine-generated  failures  of  flight  critical 
equipment  from  turbine  overspeed  were  conservatively  assumed 
to  result  in  loss  of  crew  or  vehicle,  even  if  they  occurred 
on  the  pad. 

With  one  exception,  the  APUs  were  assumed  to  be  identical 
and  spatially  symmetrical  to  each  other  so  that  frequencies 
and  consequences  were  independent  of  which  APU  had  failed. 
This  allowed  APU  3 to  be  assigned  as  the  failed  APU  with  no 
loss  of  generality  or  quantitative  accuracy  when  the  failures 
under  TA,  PA,  or  HA  occur.  The  exception  was  leakage.  The 
conditional  probability  of  failing  APU  3 given  a leak  in  APU 
1 or  APU  2 or  both  (top  event  C3)  was  lower  than  the  condi- 
tional probability  of  failing  APU  1 or  2,  given  a leak  in 
either  or  both  of  these  APUs.  Similarly,  the  conditional 
probability  of  failing  APU  1 or  2 due  to  a leak  in  APU  3 
(top  events  Cl  and  C2)  was  much  lower  than  the  conditional 
probability  of  APU  3 failing  itself. 

The  possibility  of  two  APUs  failing  independently  in  the  same 
flight  from  turbine  overspeed  was  not  modeled  because  the 
frequency  of  this  sequence  was  much  smaller  than  the  frequency 
of  sequences  leading  to  loss  of  crew  or  vehicle  that  involve 
one  turbine  overspeed  with  other  failures. 

The  frequency  of  failure  of  a running  APU  before  launch  is 
approximated  by  a function  of  the  ratio  of  time  it  runs 
before  launch  to  the  total  time  from  five  minutes  before 
lift-off  to  APU  shutdown.  All  start  failures  were  modeled 
as  occurring  before  launch. 

The  APUs  were  modeled  as  if  each  one  had  its  own  auto  shut- 
down inhibit  switch  (a  post-51L  modification) . 


k.  Two  spurious  shutdowns  or  a permanent  failure  and  a spurious 
shutdown  were  assumed  to  result  in  loss  of  crew  or  vehicle  if 
they  occurred  before  MECO.  However,  if  one  occurred  after 
the  spurious  shutdown  was  treated  as  a recoverable  failure 
for  entry/landing.  Sequences  involving  three  spurious  shut- 
downs or  one  permanent  failure  and  two  spurious  shutdowns 
were  not  explicitly  shown  in  the  event  tree  because  of  the 
extremely  small  chance  of  occurrence. 

l.  An  APU  exhibiting  a malfunction  which  by  Flight  Rules 
would  cause  MCC  to  declare  it  lost  was  assumed  to 
operate  until  after  MECO. 

m.  Hot  restarts  were  not  modeled  in  Stage  A since  they 
must  occur  after  APU  shutdown  post-MECO. 

n.  If  the  same  APU  exhibits  both  a spurious  shutdown  and 
a hydrazine  leak,  the  damage  vector  shows  it  as  a 
hydrazine  leak.  This  was  a conservative  assignment 
because  of  the  relatively  high  conditional  probability 
of  cascading  damage,  given  a leaking  APU  during 
descent.  The  net  affect  on  the  quantitative  results 
is  small  because  a leaking  APU  will  not  be  used  during 
Stage  B unless  another  APU  fails. 

o.  The  frequency  of  failures  occurring  after  MECO  was 
modeled  as  a function  of  the  ratio  of  the  time  from 
MECO  to  APU  shutdown  to  the  total  Stage  A time. 

p.  Any  APU  failure  or  spurious  shutdown  that  occurred  in 
the  thrust  bucket  was  assumed  to  lead  to  an  intact  abort. 

The  frequency  of  a failure  occurring  in  the  thrust 
bucket  was  modeled  as  a function  of  the  ratio  of  the 
time  in  the  thrust  bucket  to  the  total  Stage  A time. 

q.  Any  APU  exhibiting  a failure  or  a spurious  shutdown 
can  also  exhibit  a hydrazine  leak. 


6.4.3  Description  of  Stage  A Top  Events 

A summary  description  of  each  top  event  and  its  relationship  to 
the  rest  of  the  Stage  A Event  Tree  is  provided  in  this  section. 
The  detailed  model  that  provides  the  basis  for  assessing  the 
frequency  of  occurrence  of  each  top  event  split  fraction  is 
provided  in  Section  6.5.  The  data  required  to  quantify  these 
models  is  described  in  Section  7. 
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Top  Event  HY:  Hydraulic  Systaa  Failure 

This  event  was  included  as  an  illustration  of  how  an  event  tree 
can  include  scenarios  that  cross  subsystem  boundaries.  A failure 
of  HY  implies  that  its  associated  APU  is  useless.  The  event  tree, 
therefore,  treats  HY  failure  as  if  an  APU  has  failed. 

Top  Event  TAs  Turbine  over speed 

This  event  occurs  if  both  the  primary  and  secondary  fuel  control 
valves  fail  in  the  open  oosition  while  the  APU  is  operating  and 
thi  overfed  trip  fails  to  close  the  secondary  valve,  closure 
of  the  fuel  tank  isolation  valves  following  an  overspeed  trip  may 
not  prevent  turbine  runaway  and  shrapnel  caused  by  turbine  runaway. 
The  hydrazine  quantity  downstream  of  the  isolation  valves  may  be  ^ 
sufficient,  given  the  presence  of  bubbles  or  effective  suction  i 
the  APU  fuel  pump  to  allow  the  turbine  to  reach  breakup  speed. 

Mechanical,  electrical  and  controller  causes  of  turbine  overspeed 
were  included.  Turbine  overspeed  implies  that  the  APU  has 
failed.  It  was  then  appropriate  to  ask  if  the  resulting  shrapnel 
and  hydrazine  escape  could  have  caused  a second  APU  or  °^her 
flight  critical  equipment  in  the  aft  compartment  (i.e.,  top  event 
CA)  to  fail.  The  tree  also  asks  if  another  APU  could  have  failed 
independently  from  the  turbine  overspeed  either  by  equipment 
failure  (e.g. , top  event  DA)  or  by  leakages,  occurrence  of  this 
event  after  launch  and  in  the  absence  of  other  failures  leads  to 
a PLS  entry  unless  it  occurs  in  the  thrust  bucket,  xn  that  case, 
it  leads  to  an  intact  abort. 


Top  Event  PA:  APU  Equipment  Failure  After  APU  Start 

This  event  occurs  if  any  equipment  failure  or  failures  combine  to 
prevent  an  APU  from  providing  sufficient  power  to  its  hydraulic 
pump  as  defined  above.  For  example,  this  event  includes  break-up 
of  the  turbine  rotor  at  normal  speed.  However,  this  event 
excludes  turbine  overspeed,  leakages,  spurious  shutdowns,  an 
start  failures.  This  top  event  does  not  include  failures  caused 
by  erroneous  commands  from  sources  external  to  the  APU  (e.g., 
from  the  crew  or  MCC) . These  failures  are  outside  the  scope  of 
this  study.  The  combinatorial  failures  included  in  this  top  event 
are  described  in  detail  in  Section  6.5.  Occurrence  of  this  event 
after  launch  and  in  the  absence  of  other  failures  leads  to  a PLS 
entry  unless  it  occurs  in  the  thrust  bucket.  In  that  case,  e 
event  leads  to  an  intact  abort. 
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Top  Event  DA:  Failure  of  second  apu  After  APU  start 

This  event  asks  if  either  PA  or  TA  has  occurred.  It  includes 
failure  of  a second  APU  given  that  one  APU  has  failed.  The  same 
combinations  of  equipment  failures  that  contribute  to  PA  are  also 
relevant  here.  Occurrence  of  this  event  after  launch  leads  to 
loss  of  crew  and  vehicle. 

Top  Event  CA:  Spatial  Interaction  Failure  of  Second  APU  or 

Flight  Critical  Equipment 

This  event  includes  failure  of  a second  APU  or  flight  critical 
equipment  due  to  shrapnel  or  hydrazine  induced  cascading  damage. 

It  considers  the  possibility  that  shrapnel  and  hydrazine  leakage 
could  be  produced  by  turbine  rotor  break-up,  either  in  an  over- 
speed or  normal  speed  condition.  The  sequence  of  events  involving 
both  TA  and  CA,  then,  would  lead  to  loss  of  crew  and  vehicle  from 
turbine  shrapnel  or  leaking  hydrazine.  The  sequence  of  events 
involving  both  PA  and  CA  would  be  caused  by  one  of  the  failures 
included  in  the  PA  split  fraction  model,  namely,  turbine  rotor 
break-up.  The  subsequent  events  are  identical  to  those  for  TA  and 
PA,  but  with  a different  frequency. 

Top  Event  HA:  Exhaust  Gas  Leakage  Fails  One  APU 

This  event  includes  the  possibility  that  exhaust  gas  leakage  can 
fail  an  APU.  Occurrence  of  this  event  after  launch  and  in  the 
absence  of  other  failures  leads  to  a PLS  entry  unless  it  occurs 
in  the  thrust  bucket.  In  that  case,  the  event  leads  to  an  intact 
abort . 

Top  Event  GB:  Exhaust  Gas  Leakage  Fails  Second  APU 

This  event  includes  the  possibility  that  exhaust  gas  leakage 
fails  a second  APU  given  that  one  APU  is  known  tc  have  failed 
from  exhaust  gas  leakage  or  from  other  causes.  Occurrence  of 
this  event  after  launch  leads  to  loss  of  crew  and  vehicle. 

Top  Event  LI:  Hydrazine  Leakage  in  APU  1 

This  event  includes  leakages  of  hydrazine  into  the  aft  compartment 
from  anywhere  in  APU  1. 

Top  Event  L2:  Hydrazine  Leakage  in  APU  2 

This  event  includes  leakages  of  hydrazine  into  the  aft  compartment 
from  anywhere  in  APU  2. 
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Top  Event  L3:  Hydrazine  Leakage  in  APU  3 

This  event  includes  leakages  of  hydrazine  into  the  aft  compartment 
from  anywhere  in  APU  3 . 

The  event  tree  structure  involving  LI,  L2,  and  L3  includes  all 
combinations  of  APUs  leaking  individually  or  together  m the  same 
mission.  After  the  questions  about  leakage,  it  was  appropriate  to 
ask  about  potential  cascading  damage  caused  by  free  hydrazine  in 
the  aft  compartment.  Occurrence  of  any  detected 

cause  mission  control  to  declare  that  APU  lost  ana  lead  to  a PLS 
entry,  according  to  Flight  Rules. 


Top  Event  7A: 


Leakage-induced  Failure  of  Two  APUs  or  Flight 
Critical  Equipment 


This  event  includes  those  spatial  interactions  stemming  from  the 
Dresence  of  hydrazine  in  the  aft  compartment  that  could  cause 
failure  of  at  least  two  APUs  or  other  flight  critical  equipment. 

In  the  scenarios  in  which  one  APU  has  already  failed,  this  event 
includes  failure  of  a second  APU  or  flight  critical  equipment, 
occurrence^f  this  event  after  launch  leads  tc  loss  of  crew  and 

vehicle. 

Top  Event  CIS  Leakage  Induced  Failure  of  APU  1 

This  event  includes  spatial  interaction  induced  failure  of  APU  1 
from  the  presence  of  hydrazine  in  the  aft  compartment,  given  that 
two  APUs  have  not  already  failed,  occurrence  of  this  event  after 
launch  and  in  the  absence  of  other  failures  leads  to  aPLSentry 
unless  it  occurs  in  the  thrust  bucket.  In  that  case,  it  leads  t 

an  intact  abort. 

Top  Evant  C2s  Laakaga-Inducad  Failura  of  APU  2 

This  event  includes  spatial  interaction  induced  * 

from  the  presence  of  hydrazine  in  the  aft  compartment,  given  tha 
two  APUs  have  not  already  failed.  Occurrence  of  this  event  after- 
launch  and  in  the  absence  of  other  failures  leads  to  a PLS  •"*** 
unless  it  occurs  in  the  thrust  bucket.  In  that  case,  it  leads 
an  intact  abort. 

Top  Evant  C3s  Laakaga-Inducad  Failura  of  APU  3 

This  event  includes  spatial  interaction  induced  failure  of  APU  3 
from  the  presence  of  hydrazine  in  the  aft  compartment,  given  that 


6-48 


R 

0 

r. 

i . 

fl 

C 

H 

r 

a 

i 

I! 

r 

1! 


u 

1! 

[ 

l. 

1. 


two  APUs  have  not  already  failed,  occurrence  of  this  event  after 
launch  and  in  the  absence  of  other  failures  leads  to  a PLS  entry 
unless  it  occurs  in  the  thrust  bucket.  In  that  case,  it  leads  to 
an  intact  abort. 

In  any  sequence,  including  a leaking  APU,  Cl,  C2,  and  C3  are 
asked  in  order  to  account  for  the  possibility  that  leakage  from 
one  APU  could  fail  another  APU.  Although  the  leakages  themselves 
(occurrence  of  LI,  L2,  or  L3)  are  potentially  recoverable  if 
needed  to  support  landing,  the  additional  occurrence  of  Cl,  C2, 
or  C3  implies  a permanent,  non-recoverable  failure. 

Top  Events  81,  82,  and  S3:  Spurious  Shutdown 

This  event  includes  equipment  failures  of  APU  1 (SI) , APU  2 
(S2)  , or  APU  3 (S3)  that  cause  a spurious  shutdown  of  the 
affected  APU.  For  example,  MPU  1 failing  high  could  cause  the 
controller  to  sense  an  overspeed  and  shut  down  the  APU.  It  was 
assumed  that  this  condition  can  be  identified  during  orbit,  so 
that  the  APU  could  be  started  if  needed  to  have  two  operating 
APUs  during  descent.  Should  any  such  shutdown  occur  in  the 
thrust  bucket,  an  intact  abort  occurs.  Should  a spurious  shut- 
down occur  before  or  after  MECO,  a PLS  entry  is  assumed.  Should 
two  shutdowns  before  MECO,  a loss  of  crew  and  vehicle  results. 

Top  Event  BA:  Failure  Occurs  Before  Launch 

This  event  includes  all  combinations  of  start  failures  of  any  or 
all  APUs.  It  also  includes  that  fraction  of  running  failures  of 
any  or  all  APUs  that  occur  before  launch.  Occurrence  of  this 
event  leads  to  a launch  scrub. 

Top  Event  EA:  Failure  Occurs  in  the  Thrust  Bucket 

This  event  includes  those  failures  that  occur  in  the  thrust  bucket 
and  is  assumed  to  lead  to  an  intact  abort.  It  was  quantified  as  a 
function  of  the  ratio  of  time  in  the  thrust  bucket  to  the  total 
Stage  A time. 

Top  Event  KA:  Failure  Occurs  after  MECO 

This  event  includes  those  failures  that  occur  after  MECO.  This  is 
a significant  time  because  the  APUs  are  not  needed  for  throttling 
functions  after  the  main  engines  have  shut  down.  They  are,  however, 
needed  for  a TVC  during  the  MPS  dump,  not  considered  as  a safety 
critical  event  for  this  study.  Any  recoverable  or  permanent  APU 
failure  occurring  after  MECO  leads  to  a PLS  entry. 
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Top  Event  IA:  Intact  Abort  called  by  MCC 

If  one  APU  has  failed  and  another  was  leaking  before  MECO,  the 
flight  rules  provide  for  the  MCC  to  make  a decision  as  to  the 
ability  of  the  leaking  APU  to  support  a landing.  If  the  APU  leak 
is  large  enough  so  that  the  APU  will  not  support  a landing  at  the 
next  primary  landing  site  opportunity , then  the  MCC  may  declare 
an  intact  abort  to  allow  the  shuttle  to  return  as  soon  as  possible. 
Occurrence  of  this  event  leads  to  an  intact  abort  in  the  event 
tree. 


. 6.4.4  stage  B Event  Trees 

The  Stage  B Event  Trees  are  shown  in  Appendices  B6.4-2  and 

35 . 4-3 . They  model  the  time  from  APU  shutdown  after  the  orbital 

insertion  burn  to  APU  shutdown  after  wheel stop. 


6. 4. 4.1  Relationship  of  ESD  to  Stage  B Event  Trees 

Table  6.4.4  presents  a summary  description  of  each  top  event  in 
the  Stage  B Event  Trees  (refer  to  Appendices  B6.4-2  and  B6.4-3 
for  the  event  trees  themselves) . Table  6.4.5  relates  each  top 
event  in  the  Stage  B Event  Trees  to  one  or  more  ESD  questions. 


6. 4. 4. 2 Construction  of  the  Stage  B Event  Trees 

The  Stage  B7  Event  Tree  (Appendix  B6.4-2)  was  initiated  by  the 
OK  damage  bin  described  in  Section  6.4.1  (also  called  Impact 
Vector  1) . It  must  represent  scenarios  consisting  of  up  to  two 
APU  failures  in  order  to  result  in  the  LOC/V  damage  state.  The 
Stage  B4  Event  Tree  (Appendix  B6.4-3)  was  initiated  by  damage 
bin  No.  4,  described  in  Section  6.4.1  (also  called  Impact  Vector 
2),  which  consists  of  Stage  A scenarios  ending  with  one  APU 
failed.  The  Stage  B4  Event  Tree  is  far  simpler  because  we  need 
only  represent  scenarios  consisting  of  no  APU  failures  or  one  APU 
failure  in  order  to  result  in  the  LOC/V  damage  state. 


Accuracy  and  completeness  of  the  modeling  and  quantification 
effort  in  those  areas  of  the  study  that  can  potentially  contribute 
most  to  the  risk  are  important,  standard  practice  in  multi-stage 
modeling  is  to  estimate  the  potential  contribution  to  the  total 
mission  risk  from  each  Stage  A damage  bin.  This  allows  the  allo- 
cation of  the  study  resources  (e.g. , manpower,  time,  and  money) 
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to  those  areas  that  .are  estimated  to  be  the  most  important 
contributors  to  the  total  mission  risk. 

It  was  determined  for  this  study  that  Stage  B Event  Trees  for 
damage  bins  5 and  6 need  not  be  developed  because  of  their 
extremely  low  frequency  of  occurrence.  That  is,  the  resources 
required  to  develop  event  trees  and  split  fraction  models,  and  to 
perform  quantification  for  bins  5 and  6 would  be  wasted  because 
these  bins  could,  at  most,  contribute  less  than  one  percent  of 
the  total  frequency  of  loss  of  crew  or  vehicle  for  the  total 
flight. 

In  view  of  this,  it  was  decided  to  allocate  resources  to  the 
detailed  analysis  of  the  top  99%  of  the  potential  total  mission 
risk.  However,  the  contribution  of  damage  bins  5 and  6 are  not 
neglected.  They  were  conservatively  assumed  to  lead  to  loss  of 
crew  or  vehicle  when  all  of  the  contributors  to  the  LOC/V  state 
for  the  entire  flight  were  added  up.  This  is  standard  practice 
for  PRA. 

The  assumptions,  groundrules  and  approximations  used  to  construct 
the  Stage  B trees  are  as  follows: 

a.  APU  failure  is  defined  as  the  inability  to  power  its 
associated  hydraulic  pump  to  the  extent  necessary  to 
maintain  adequate  hydraulic  pressure  at  expected 
hydraulic  demand. 
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TABLE  6.4.4 


Event 

IE 

SS 

DS 

TB 

PB 

DB 

CB 

HB 

GB 

Ml 

M2 

M3 

FB 

D1 

D2 

D3 


TOP  EVENT  DEFINITIONS  - APU  EVENT  TREE  - STAGE  B* 


Definition 


X 


Damage  Bin  From  Stage  A 
One  APU  Fails  to  Start 
Second  APU  Fails  to  Start 
Turbine  overspeed 

Equipment  Failure  of  One  APU  After  it  Starts 

Failure  of  the  Second  AFU  After  it  Starts 

Failure  of  the  Second  APU  or  Failure  of  Flight  Critical 
Equipment  Owing  to  Spatial  Interactions  Initiated  by 
Failure  of  the  First  APU 

Failure  of  one  APU  Due  to  Exhaust  Gas  Leak,  or  GGVM 
Detonation 

Failure  of  Flight  Critical  Equipment  or  the  Second  APU 
Due  to  Exhaust  Gas  Leak,  or  Valve  Detonation 

Leakage  of  Hydrazine  from  APU  1 

Leakage  of  Hydrazine  from  APU  2 

Leakage  of  Hydrazine  from  APU  3 

Failure  of  Flight  Critical  Equipment  or  Two  APUs 
Due  to  Spatial  Interactions  Initiated  by  Hydrazine 
Leakage 

Hydrazine  Leakage  Causes  Failure  of  APU  1 Given  that 
Two  APUs  Have  Not  Failed 

Hydrazine  Leakage  Causes  Failure  of  APU  2 Given  that 
Two  APUs  Have  Not  Failed 

Hydrazine  Leakage  Causes  Failure  of  APU  3 Given  that 
Two  APUs  Have  Not  Failed 
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TABLE  6.4.4  (Concluded) 

Event  Definition 

R1  Leak  in  APU  1 Before  EI-13  or  into  Pump  Seal  Cavity 

R2  Leak  in  APU  2 Before  EI-13  or  into  Pump  Seal  Cavity 

R2  Leak  in  APU  3 Before  EI-13  or  into  Pump  Seal  Cavity 

T1  Spurious  Shutdown  of  APU  1 

T2  Spurious  Shutdown  of  APU  2 

T3  Spurious  Shutdown  of  APU  3 

TE  Failure  of  at  Least  One  APU  After  TAEM-3.5  Minutes 

PW  Failure  of  at  Least  One  APU  After  Wheelstop 

RE  Failure  to  Recover  APU  When  Needed  For  Landing 

SB  Uninhibited  Spurious  Shutdown  of  at  Least  One  APU 

(Applies  Only  for  Impact  Vector  Two) 


* Stage  B Event  Trees  are  Shown  in  Appendices  B6.4-2 
and  B6.4-3. 
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TABLE  6.4.5 


RELATIONSHIP  OP  8TA6E  B EVENT  TREE  TOP  EVENTS  TO  APU 
ESOS  2,  3 , AND  4 — ORBIT  AND  ENTRY/DESCENT/LANDING* 


Questions  Prom  Appendices  B6.3-2  Through  B6.3-4 


"No  Permanent  APU  Failures" 

This  Box  Represents  Both  Start  and  Run  Failures.  None 
of  the  Start  Failures  were  Identified  as  Potentially 
Leading  To  Spatial  Interaction  Events.  Start  Failures 
Separated  from  Run  Failures  to  Accurately  Quantify 
Failures  Which  Could  Not  Occur  After  Wheelstop. 

"Turbine  Speed  Control  OK"  and  all  Boxes  Beneath  this 
Question 

"No  Permanent  APU  Failures"  and  all  Boxes  Beneath  this 
Question 

"Hydrazine  Does  Not  Overheat  After  Shutdown" 

"Overheating  Does  Not  Cause  Detonation" 

"Temperature  Stays  Above  Minimum  for  Hydrazine,  Lube 

Oil,  and  Gas  Generator" 

♦ 

"Sufficient  Oxygen  for  Fire  in  AFT  Compartment" 
"Unisolated  Leak" 

"Fire  in  AFT  Compartment" 

All  Questions  Following  "SIE".  They  Include: 

"SIE  Does  Not  Fail  Flight  Critical  Equipment" 

"SIE  End  Initial  Failure  Does  Not  Cause  Two  APUs  to 
Fail" 

"SIE  and  Initial  Failure  Does  Not  Cause  the  Second 
APU  to  Fail  With  One  Already  Failed" 

The  Above  Questions  Relate  to  Spatial  Interactions  that 
Follow  Failures  Involving  Shrapnel 


Event 
HB,  GB 

Ml,  M2, 
M3 

FB 

Dl,  D2, 
03 


TABLE  6.4.5  (Continued) 


Questions  From  Appendices  B6.3-2  Through  B6.3-4 


"Exhaust  Gas  Boundary  Remains  Intact"  and  all  Spatial 
Interaction  Questions  Beneath  it.  The  Spatial  Inter- 
action Questions  Refer  to  the  Damage  Potentially  Caused 
by  Exhaust  Gas  Release 

"Fuel  Bound  Areas  Remain  Intact"  and  all  Spatial  Inter- 
action Questions  Beneath  it.  The  Spatial  Interaction 
Questions  Refer  to  the  Damage  Potentially  Caused  by 
Hydrazine  in  the  Aft  Compartment. 

"Fuel  Boundaries  Remain  Intact" 


"Hydrazine  Boundary  Remains  Intact"' 

All  Questions  Beneath  "Hydrazine  Boundary  Remains 
Intact"  in  Appendix  B6.3-2 

All  Questions  Beneath  "Fuel  Boundary  Remains  Intact"  in 
Appendix  B6.3-3 

All  Questions  Following  "SIE".  The  Spatial  Interaction 
Questions  now  Refer  to  Damage  of  Flight  Critical  Equip- 
ment or  APUs  Potentially  Caused  by  Hydrazine  in  the  Aft 
Compartment 

"APU  Fuel  Quantity  and  Tank  Pressure  can  Support  Start 
and  Landing" 

"Leak  Isolated" 

"Hot  Restart  Without  Detonation"  and  all  Questions  that 
Follow  it 

"Remaining  Fuel  Quantity  and  Tank  Pressure  Sufficient  to 
Support  Landing" 

"Tank  Pressure  Sufficient  for  Restart" 

"Sufficient  Oxygen  to  Support  Fire" 

"Fire  in  Aft  Compartment" 
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TABLE  6.4.5  ( Concluded ) 


Event 


R1 , R2 , 
R3 


Tl,  T2 , 
T3 


TE,  RE 


PW 


Questions  From  Appendices  B6.3-2  Through  B6.3-4 


All  Questions  Following  "SIE" . These  Spatial  Inter- 
action Questions  now  Refer  to  the  Damage  Potentially 
Caused  by  Hydrazine  in  the  Aft  Compartment  to  an 
Individual  APU. 

"No  Seal  Cavity  Leak"  and  Questions  Below  it  in 
Appendix  B6.3-3 

"Leak  Detected  Before  Blackout"  and  Questions  to  the 
Right  of  it  in  Appendix  B6.3-3 

"No  Recoverable  Failures" 

Spurious  Shutdowns  and  Isolatable  Leaks  were  Modeled 
as  Recoverable  Failures 

"Start  Recoverable  APU  Before  TAEM" 

"Recovered  APU  Operates  OK" 

"Has  Wheelstop  Occurred." 


Stage  B Event  Trees  are  Shown  in  Appendices  B6.4-2 
and  B6.4-3. 
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b.  Two  APU  failures  lead  to  loss  of  crew  or  vehicle  (LV) . 

c.  All  failures  except  spurious  shutdown  and  detected 
leakages  are  modeled  as  permanent  (non-recoverable) . 

d.  The  event  tree,  split  fraction  models  and  quantification 
reflect  the  following  Flight  Rules  (Reference  39) 
wherever  applicable:  10-19,  10-20,  10-22,  10-23,  10-24, 

10-25,  10-27,  10-28,  10-29,  10-31,  and  10-36. 

e.  A "large"  hydrazine  leak  is  defined  as  a leak  for  which 
the  APU  would  deplete  all  usable  fuel  before  the  end  of 
the  mission. 

f.  APU  failures  that  occurred  after  wheelstop  were  modeled. 
However,  the  frequency  of  these  failures  leading  to 
LOC/V  is  believed  to  be  negligible  and  is  not  quantified. 

g.  With  one  exception,  the  APUs  are  assumed  to  be  identical  and 
spatially  symmetrical  to  each  other  so  that  frequencies  and 
consequences  are  independent  of  which  APU  has  failed.  This 
allowed  APU  3 to  be  assigned  as  the  failed  APU  with  no  loss 
of  generality  or  quantitative  accuracy  when  the  failures 
under  TA,  PA,  or  HA  occur.  The  exception  was  leakage.  The 
conditional  probability  of  failing  APU  3,  given  a leak  in  APU 
1 or  APU  2 or  both  (top  event  C3)  was  lower  than  the  condi- 
tional probability  of  failing  APU  1 or  2,  given  a leak  in 
either  or  both  of  these  APUs.  Similarly,  the  conditional 
probability  of  failing  either  APU  1 or  2 due  to  a leak  in  APU 
3 (Top  Events  Cl  and  C2)  is  much  lower  than  the  conditional 
probability  of  APU  3 failing  itself. 

h.  The  possibility  of  two  APUs  failing  independently  in  the  same 
mission  from  turbine  overspeed  is  not  modeled  because  the 
frequency  of  this  sequence  is  much  smaller  than  the  frequency 
of  sequences  leading  to  loss  of  crew  or  vehicle  that  involves 
one  turbine  overspeed  with  other  failures. 

i.  A spurious  shutdown  that  occurs  later  than  3.5  minutes  before 
TAEM  was  assumed  to  be  non— recoverable  in  time  to  support  the 
remainder  of  the  mission. 

j . The  APUs  were  modeled  as  if  each  one  had  its  own  auto 
shutdown  inhibit  switch  (a  post-51L  modification) . 
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Any  APU  exhibiting  a malfunction  which  by  night  Rules 
would  cause  the  «CC  to  declare  it  losb  on  ■ °rt£c““„bXe 
assumed  to  be  started , if  neede  f , ^ j-e 

failures  occurring  after  TIS-5  minutes  «e  assumedto^e 

restart  able,  if  needed,  at  TAEM-  • during  FCS 

shutdowns  that  occurred  during  ascent  -r  duri  g 
checkout  are  assumed  to  be  started,  if  needed,  at  El  13, 
with  auto  shutdown  inhibit  in  effect. 

Hot  restarts  are  modeled  in  Stage  B and  include  failure 
of  the  injector  cooling  system  and  the  potential  for 
detonation  if  injector  cooling  fails. 

Anv  failed  APU  can  also  exhibit  a hydrazine  leak.  The 
potential  spatial  interactions  from  that  leak  were 
included  in  the  model. 

Automatic  shutdown  is  assumed  to  inhi^!!dQ^*SS 
the  circuit  fails)  for  any  attempted  restart  or  any 
start  of  an  APU  with  another  having  already  failed. 

One  APU  which  suffers  a spurious  shutdown  during  Stage 
B with  no-  other  failed  APUs  will  not  be  restarted. 

Three  normally  recoverable  failures  occurring  be  ore 
wheelstop01  are  considered  loss  of  crew  and  vehicle 
Sis  is  because  the  second  and  third  failures  would 
have  to  occur  in  spite  of  auto  shutdown  being 
inhibited,  and  would  thus  be  irrecoverable. 

Hydrazine  overheating  due  to  loss  of  *uel  Pump/CG^  co°^^ 
is  judged  to  be  an  insignificant  contributor  to  risk.  This 
cooling  system  is  employed  only  in  certain abort  cases  wh 
considerations  are  outside  the  scope  of  this  study  and, 
therefore,  is  not  quantified. 

Stage  B split  fraction  models  were  quantified  independently 
of  Stage  A.  This  means  that  independent  * Tire 

dant  components  that  occurred  in  a ± i 

treated  as  not  failed  at  the  start  of  Stage  B.  This 
considered  an  acceptable  simplification  because  *tal  mission 
phase  (Stage  A)  represents  less  than  1%  of  th 
time  during  which  these  failures  could  possibly  occur. 

Small  leakages  are  treated  as  being  undetectable  during  stage 
B However,  the  model  does  provide  for  shutdown  of  an  AP 
wAose  p-P  -2  was  leaking  before  blackout.  The  -del  pro- 
vides for  failing  an  APU  as  a result  of  a leak  into  the 
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solenoid  cavities  or  as  a result  of  an  unisolatable  external 
leak.  For  all  other  leaks,  a running  APU  is  conservatively 
modeled  as  continuing  to  run  without  being  shut  down  or 
restarted.  This  treatment  is  consistent  with  the  experience 
of  STS-9  when  the  leak  is  not  detected  until  too  late  to 
shutdown  the  APUs. 


6.4.5  Description  of  Stage  B Tod  Events 

A summary  description  of  each  top  event  and  its  relationship  to 
the  rest  of  the  Stage  B Event  Tree  is  provided  in  this  section. 
The  detailed  model  that  provides  the  basis  for  assessing  the 
frequency  of  occurrence  of  each  top  event  split  fraction  is 
provided  in  Section  6.5.  The  data  required  to  quantify  these 
models  is  described  in  Section  7. 

Top  Events  ss  and  DS:  APUs  Pail  to  start 

These  events  included  all  start  failures  of  APUs  either  at 

deorbit  TIG-5  minutes  or  at  EI-13  minutes.  Event  SS  represents 

failure  of  one  APU  to  start;  event  DS  represents  failure  of  a 

second  or  third  APU  to  start,  given  that  one  APU  has  already 

failed.  These  failures  are  malfunctions  that  occur  from  APU 

equipment  failures  occurring  at  start  attempt.  These  failures 

cannot  be  recovered.  Therefore,  the  occurrence  of  DS  implies 

loss  of  crew  and  vehicle.  The  occurrence  of  SS  implies  that 

one  APU  is  lost  for  Stage  B and  that  the  failure  of  one  more 

APU  would  cause  loss  of  crew  and  vehicle. 

♦ 

Top  Event  TB:  Turbine  Overspeed 

This  event  occurs  if  both  the  primary  and  secondary  fuel  control 
valves  fail  in  the  open  position  while  the  APU  is  operating  and 
the  overspeed  trip  fails  to  close  the  secondary  valve.  Occurrence 
of  this  event  after  a previous  APU  failure  would  not  require 
failure  of  the  overspeed  trip  because  the  auto  shutdown  function 
would  have  been  inhibited.  Closure  of  the  fuel  tank  isolation 
valves  following  an  overspeed  trip  may  not  prevent  turbine  run- 
away and  shrapnel  caused  by  turbine  runaway.  The  quantity  of 
hydrazine  downstream  of  the  isolation  valves  may  be  sufficient 
given  the  presence  of  bubbles  or  effective  suction  by  the  APU 
fuel  pump  to  allow  the  turbine  to  reach  breakup  speed. 

Mechanical,  electrical,  and  controller  causes  of  turbine  over- 
speed were  included.  Turbine  overspeed  implies  that  the  APU  has 
failed.  It  was  then  appropriate  to  ask  if  the  resulting  shrapnel 
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and  hydrazine  escape  could  have  caused  a second  " ot^  tree 

fliaht  critical  equipment  (i.e.,  top  event  CB)  to  fail.  Th-  t«e 
•also  asks  if  another  APU  could  have  failed  independently  from  th 
turbine  overspeed  either  by  equipment  failure  (e.g. , top  event 
DB)  or  by  leakages.  Occurrence  of  this  event  leads  to  failure  of 
on!  APU  and  to  l release  of  hydrazine  into  the  aft  compartment. 
Failure  of  another  APU  as  a consequence  of  the  shrapnel  and 
hydrazine  release  is  treated  in  event  CB. 

Top  Event  PB:  APU  Equipment  Pailure  After  APU  Start 

This  event  occurs  if  any  equipment  failure  or  failures  combine  to 
prevent  an  APU  from  providing  sufficient  power  to  its  hydraulic 
pump  as  defined  above.  For  example,  this  event  includes  break-up 
of  the  turbine  rotor  at  normal  speed,  and  heater  failures.  Hea  er 
f a i lures  were  quantified  for  the  orbit  period.  This  event  does 
exclude,  however,  turbine  overspeed,  leakages,  spurious  shutdowns, 
and  start  failures.  This  top  event  does  not  : 

caused  by  erroneous  commands  from  sources  external  to  the  APU 
(e.g.  from  the  crew  or  the  MCC)  . These  failures  are  outside  the 
scope' of  this  study.  The  combinatorial  failures  included  in  this 
top  event  are  described  in  detail  in  Section  6.5.  Occurrence  of 
this  event  leads  to  failure  of  one  APU.  If  turbine  break  up  has 
occurred,  shrapnel-  and  hydrazine-related  spatial  interaction 
events  are  accounted  for  in  event  CB. 

Top  Event  DB:  Pailure  of  Second  APU  After  APU  Start 

This  event  is  asked  if  either  PB  or  TB  has  occurred.  It  includes 
failure  of  a second  APU  given  that  one  APU  is  known  to  have  failed. 

The  same  combination  of  equipment  failures  that  contribute  to  PB 
are  also  relevant  here.  Occurrence  of  this  event  after  launch 
leads  to  LOC/V. 


Top  Event  CB: 


spatial  Interaction  Pailure  of  second  APU  or 
Plight  Critical  Equipment 


This  event  includes  failure  of  a second  APU  or  flight  critical 
equipment  due  to  shrapnel  or  hydrazine- induced  propagating  damage. 
It  considers  the  possibility  that  shrapnel  and  hydrazine  leakage 
could  be  produced  by  turbine  rotor  break-up,  either  m an  over- 
speed or  normal  speed  condition.  The  sequence  of  events  involving 
both  TB  and  CB,  then,  would  lead  to  IXDC/V  from  turbine  shrapnel  or 
leaking  hydrazine.  Th.  sequence  of  events  involving  both  PB  and 
CB  would  be  caused  by  one  of  the  failures  included  m the  PB  split 
fraction  model,  namely,  turbine  rotor  breakup.  The  subsequen 
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events  are  identical  -to  those  for  TB  and  PB,  but  with  a different 
frequency. 

Top  Event  HB:  Exhaust  Gas  Leakage  or  Detonation  of  GGVH 

This  event  includes  the  possibility  that  exhaust  gas  leakage  can 
fail  an  APU.  It  also  includes  the  possibility  that  hydrazine 
leaks  into  the  solenoid  cavity  of  one  of  the  fuel  control  valves, 
autodecomposes,  and  ruptures  the  valve  cover  such  that  hydrazine 
escapes  into  the  aft  compartment.  A large  hole  is  conservatively 
assumed  to  be  formed  and  the  APU  is  assumed  to  be  lost. 

Top  Event  GB:  Exhaust  Gas  Leakage  or  Detonation  of  Isolation 

Valve 

This  event  includes  the  possibility  that  exhaust  gas  leakage  fails 
a second  APU  given  that  one  APU  is  known  to  have  failed  from 
exhaust  gas  leakage  or  from  other  causes.  It  also  includes  the 
possibility  that  hydrazine  leaks  into  the  solenoid  cavity  of  one 
of  the  fuel  tank  isolation  valves,  autodecomposes,  and  ruptures 
the  valve  cover  such  that  hydrazine  escapes  into  the  aft  compart- 
ment. This  leakage  is  assumed  to  be  unisolatable  and  large.  It 
allows  the  contents  of  the  fuel  tank  to  enter  the  aft  compartment. 
The  conditional  probability  of  failing  another  APU  or  flight 
critical  equipment  with  the  contents  of  the  fuel  tank  emptied 
into  the  aft  compartment  was  so  large  that  this  event  has  been 
assigned  to  the  loss  of  crew  or  vehicle  damage  state. 

Top  Event  Ml:  Hydrazine  Leakage  in  APU  l 

This  event  includes  leakages  of  hydrazine  into  the  aft  compartment 
from  anywhere  in  APU  1,  except  those  leakages  covered  in  HB  and  GB 
above,  and  those  from  the  fuel  pump  seal  into  the  drain  line. 

Top  Event  K2:  Hydrazine  Leakage  in  APU  2 

This  event  includes  leakages  of  hydrazine  into  the  aft  compartment 
from  anywhere  in  APU  2 , except  those  leakages  covered  in  HB  and  GB 
above,  and  those  from  the  fuel  pump  seal  into  the  drain  line. 

Top  Event  M3:  Hydrazine  Leakage  in  APU  3 

This  event  includes  leakages  of  hydrazine  into  the  aft  compartment 
from  anywhere  in  APU  3 , except  those  leakages  covered  in.  HB  and  GB 
above . 
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The  event  tree  structure  involving  Ml,  M2,  and  M3  includes  a * 
combinations  of  APUs  leaking  individually  or  together  m the  same 
mission.  After  the  questions  about  leakage,  it  was  appropriate 
to  ask  about  potential  cascading  damage  caused  by  hydrazine 
release.  Leakage  was  quantified  from  the  end  of  Stage  A through 
AFU  shutdown,  including  orbit. 

Top  Event  PBs  Leakage  Induced  Failure  of  Two  APUs  or  Plight 

Critical  Equipment 

This  event  includes  those  spatial  interactions  stemming  from  the 
presence  of  hydrazine  in  the  aft  compartment  that  could  cause 
failure  of  at  least  two  APUs  or  other  flight  critical  equipment. 

For  scenarios  in  which  one  AFU  has  already  failed,  this  event 
includes  failure  of  a second  APU  or  flight  critical  equipment. 
Occurrence  of  this  event  before  wheelstop  leads  to  loss  of  crew 

and  vehicle. 

Top  Event  01:  Leakage  Induced  Failure  of  APU  1 

This  event  includes  spatial  interaction  induced  failure  of  APU  1 
from  the  presence  of  hydrazine  in  the  aft  compartment,  given  that 
two  APUs  have  not  already  failed. 

Top  Event  02 : Leakage  Induced  Failure  of  APU  2 

This  event  includes  spatial  interaction  induced  failure  of  APU  2 
from  the  presence  of  hydrazine  in  the  aft  compartment,  given  that 
two  APUs  have  not  already  failed. 

Top  Event  03:  Leakage  Induced  Failure  of  APU  3 

This  event  includes  spatial  interaction  induced  failure  of  APU  3 
from  the  presence  of  hydrazine  in  the  aft  compartment,  given  that 
two  APUs  have  not  already  failed. 

In  any  sequence  in  which  any  APU  is  leaking , D1 , D2 , and  D3  are 
asked  in  order  to  account  for  the  potential  of  leakage  from  one  APU 
failing  another  APU.  Although  the  leakages  themselves  (occurrence 
of  Ml,  M2,  or  M3)  are  potentially  recoverable  if  needed  to  support 
landing,  the  additional  occurrence  of  Dl,  D2,  or  D3  implies  a 
permanent,  non-recoverable  failure. 

Top  Events  Rl,  R2 , R3:  Seal  Cavity  Leaks 

These  events  include  the  fraction  of  leakages  that  occur  before 
El-13,  and  those  that  occur  through  the  fuel  pump  seal  into  the 
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seal  drain  line  for  APU  1 (Ri) , APU  2 (R2) , and  APU  3 (R3) . 
Should  any  of  these  types  of  leakages  be  detected  in  the 
absence  of  an  APU  failure,  Flight  Rules  indicate  that  the 
leaking  APU  would  be  shut  down,  and  restarted  only  if  needed 
for  landing.  The  model  assumes  that  such  restarts  are  made 
at  TAEM-3.5  minutes.  Should  these  events  occur  during  a 
scenario  that  includes  a previous  failure  of  an  APU,  then  the 
model  assumes  that  the  leading  APU  will  continue  to  operate. 

Top  Events  Tl,  T2 , and  T3:  Spurious  Shutdown 

This  event  includes  equipment  failures  of  APU  1 (Tl) , APU  2 
(T2) , or  APU  3 (T3 ) that  would  cause  a spurious  shutdown  of  the 
affected  APU.  For  example,  MPU  1 failing  high  could  cause  the 
controller  to  sense  an  overspeed  and  shut  down  the  APU.  If  one 
APU  has  exhibited  a spurious  shutdown  and  no  other  APUs  have 
failed  or  have  been  declared  lost,  then  the  model  assumes  that 
the  APU  experiencing  the  spurious  shutdown  is  not  restarted 
because  it  is  not  needed.  If  the  spurious  shutdown  occurs  after 
TAEM-3.5  minutes,  then  the  APU  is  considered  lost.  Otherwise, 
the  APU  will  be  recovered  at  TAEM-3.5  minutes.  If  a scenario 
includes  two  spurious  shutdowns  before  TAEM-3.5  minutes,  one  (the 
second  shutdown  that  occurred)  represents  a permanent  failure 
because  auto  shutdown  would  have  been  inhibited  after  the  first 
APU  failed.  The  model  assumes  that  recovery  of  the  APU  that 
failed  first  is  attempted  at  TAEM-3.5  minutes. 

Top  Event  TE:  Failure  Occurs  After  TAEM-3.5  Minutes 

This  event  includes  the  fraction  of  all  APU  failures  that  occur 
after  TAEM-3.5  minutes.  All  such  failures  are  assumed  to  be  non- 
recoverable.  Two  such  failures,  including  spurious  shutdowns, 
assumed  to  lead  to  loss  of  crew  and  vehicle.  This  time  was 
selected  because  analysis  groundrules  dictate  that  two  APUs  are 
required  for  TAEM  and  the  approach  and  landing  phases  of  entry. 
The  3.5  minute  margin  accounts  for  the  injector  cooling  hot  re- 
start procedure  required  to  restart  a previously  shut  down  APU. 
The  model  conservatively  ignores  the  APU  cool  down  procedure. 

Top  Event  PW:  Failure  Occurs  after  Wheeletop 

This  event  includes  those  failures  that  occur  after  wheelstop. 
This  is  significant  because  the  APUs  are  no  longer  needed  after 
wheelstop;  APU  failures  cannot  cause  a loss  of  crew  or  vehicle 
unless  the  failure  causes  a catastrophic  explosion  or  fire. 

All  APU  failures  that  occur  after  wheelstop  have  been  modeled. 
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However,  the  frequency  of  failure  is  believed  to  be  negligible. 
Therefore,  they  do  not  contribute  to  the  risk  of  loss  of  crew 
or  vehicle. 

Top  Event  REs  Failure  to  Recover  APU 

Event  RE  asks  if  an  APU  that  had  been  shut  down  by  MCC  call  or 
had  experienced  a spurious  shutdown  during  entry  was  success- 
fully restarted.  It  includes  failure  of  injector  cooling  with 
subsequent  potential  for  detonation  of  the  APU.  Occurrence  of 
this  event  leads  to  a loss  of  crew  and  vehicle.  The  fact  that 
the  restart  was  attempted  indicates  that  the  APU  was  needed  to 
support  landing . 


6.5  SPLIT  FRACTION  MODEL  DEVELOPMENT 


6.5.1  Principles  of  Model  Development 

A guiding  principle  for  the  modeling  and  computational  effort 
was  to  place  more  emphasis  and  detail  in  those  aspects  of  the 
model  that  promised  to  be  important  to  risk.  This  meant, 
for  example,  that  many  scenarios  involving  large  numbers  of 
failure  occurrences  would  not  be  important  because  of  their 
low  associated  probabilities.  Such  scenarios  could  be  quickly 
estimated  by  a preliminary  analysis  using  a general  knowledge 
of  the  model  and  the  basic  event  data.  It  was  not  difficult,  for 
example,  to  estimate  the  order  of  magnitude  of  the  total  LOC/v 
frequency  from  a knowledge  of  the  event  tree,  APU  design,  and 
the  failure  history  database  without  going  through  the  formal 
computer  analysis.  However,  in  some  cases  knowledge  to  make 
such  initial  assessments  was  not  available  until  late  in  the 
study.  It  was  necessary  to  include  such  events  in  the  analysis. 
One  of  the  most  prominent  examples  is  the  case  of  consequential 
permanent  failures  resulting  from  exhaust  gas  leaks.  Exhaust 
gas  leaks  were  identified  in  the  master  logic  diagrams  as  an 
initiating  failure  and  were,  therefore,  included  in  the  event 
trees.  Their  frequency  of  occurrence  and  the  conditional  prob- 
ability of  consequential  failure  of  an  APU  were  not  assessed 
until  well  after  the  event  trees  had  been  completed  and  while 
the  split  fraction  models  were  under  development.  Their 
contribution  to  risk  was  found  to  be  negligibly  small  (less 
than  0.1  per  cent  of  the  total  LOC/V  frequency).  The  exhaust 
leak  models  are,  therefore,  more  complex  than  necessary. 
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In  developing  the  interrelated  event  tree  and  fault  tree  models, 
it  was  also  necessary  to  strike  a balance  in  modeling  complexity 
between  these  two  types  of  logic  trees.  This  was  an  iterative 
process  that  began  by  developing  a simple  first-cut  event  tree 
and  its  associated  fault  trees.  The  fault  trees  were  found  to  be 
too  complex  to  be  analyzed  easily.  This  led  to  a more  complex 
event  tree,  and  the  associated  fault  trees  were  found  to  be  much 
more  reasonable.  This  iterative  process  was  continued  until  a 
reasonable  balance  was  achieved. 

The  fault  tree  analysts  also  had  to  be  aware  of  the  data  analysis 
because,  as  discussed  in  the  Study  Methodology  Section  (Section 
5) , it  is  pointless  to  model  components  at  a level  below  that  for 
which  data  exists.  Furthermore,  the  availability  of  data  in  a 
particular  form  influences  the  way  the  fault  tree  analyst  chooses 
to  express  the  basic  events.  The  process  of  split  fraction 
modeling  is  iterative  and  highly  interactive  with  the  event  tree 
development  and  data  analysis  processes. 

As  indicated  in  Section  6.4,  the  event  tree  for  APU  Stage  A is 
a logic  diagram  that  shows  the  various  admissible  combinations 
of  top  event  occurrences  and  nonoccurrences  that  constitute 
the  various  scenarios  to  be  analyzed.  In  order  to  compute  the 
scenario  occurrence  frequencies,  it  is  first  necessary  to  compute 
the  appropriate  split  fractions  for  the  top  events  appearing  in 
each  scenario.  In  some  cases,  these  split  fractions  are  single 
numbers  determined  from  all  available  evidence,  as  described 
in  Section  5.  In  other  cases,  the  top  events  represent  a sub- 
stantial part  of  the  APU,  and  the  corresponding  split  fractions 
were  computed  from  fault  tree  analyses.  The  paragraphs  that 
follow  describe  the  fault  trees  that  were  developed  for  calcu- 
lating the  split  fractions  for  the  event  tree  top  events.  The 
outcome  of  the  split  fraction  models,  when  evaluated  by  the  data 
for  the  basic  events,  is  a set  of  split  fraction  Cause  Tables  as 
described  in  Section  5 and  as  shown  in  the  Quantitative  Results 
Section  (Section  8) . 


6.5.2  General  Groundrules  and  Assumptions 

Before  describing  the  fault  trees,  it  is  appropriate  to  describe 
some  general  ground  rules,  assumptions,  and  analysis  consider- 
ations that  are  fundamental  to  all  of  the  fault  trees.  One  of 
the  assumptions  concerns  the  asymmetry  in  APU  physical  locations. 
The  main  area  in  which  this  consideration  might  be  significant  is 
in  spatial  interactions  — that  is,  in  the  area  of  cascading 
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failure  of  an  APU  following  the  failure  of  some  other  APU.  It  was 
decided  to  simplify  the  analysis  by  assuming  that  the  APU s are 
symmetrical  with  respect  to  physical  location.  That  is,  all  APUs 
are  assumed  to  be  co-located  together  in  the  .aft  compartment  m 
the  same  way  that  APU  1 and  APU  2 are  co-located.  This  is  a 
conservative  assumption.  Because  of  this  assumption,  there  is  no 
uniqueness  to  the  names  of  the  APUs.  Thus,  if  an  uniaentif led , 
unnamed  APU  fails  in  conjunction  with  one  cf  the  top  events  in 
the  event  tree  (call  that  Event  El),  then  that  failed  APU  can  be 
"named"  APU  3 without  any  loss  of  generality.  The  actual  name  of 
that  failed  APU  is  of  no  importance  in  determining  probabilities. 

Consider  now  some  other  top  event  (call  it  E2)  that  appears  to  the 
right  of  event  El  in  the  event  tree.  Fault  tree  mocels  can  now  be 
constructed  for  event  E2  in  which  the  failed  APU  3 does  not  appear. 
This  represents  a great  simplification  in  the  modeling  process. 

Another  simplifying  assumption  is  that  the  failure  of  either 
isolation  valve  to  close  for  APU  shutdown  is  a permanent  failure. 
.This  represents  a slight  conservatism  with  respect  to  the 
potential  recovery  procedures  allowed  in  Flight  Rule  10-11C,  but 
it  greatly  simplifies  the  analysis  process.  Were  it  to  have  been 
found  that  this  failure  mode  yielded  a significant  contribution 
to  LOC/V,  then  the  models  could  have  been  changed  to  reflect  the 
recovery  process  allowed  in  the  flight  rules  and  the  calculations 
revised  to  show  the  effect. 


6.5.3  Treatment  of  Exhaust  Duct  LefrtaSS 

After  some  preliminary  modeling  and  quantification  cf  exhaust  duct 
leakage,  it  was  concluded  that  exhaust  duct  leakage  would  be  a 
negligible  contributor  to  loss  of  crew  or  vehicle.  The  reasons 
for  this  are  as  follows: 

a.  The  frequency  of  occurrence  of  exhaust  duct  leakage  either 
from  shrapnel  or  from  random  failure  is  very  low  (approxi- 
mately one  occurrence  in  one  hundred  thousand  hours  of  APU 
operation) . 

b.  Exhaust  duct  leakage  does  not  constitute  loss  of  the  APU. 

c.  The  probability  of  failure  of  an  APU  or  of  flight  critical 
equipment  in  the  aft  compartment  as  a consequence  of  exhaust 
gas  impingement  is  quite  low  (between  one  in  one  hundred  and 
one  in  one  thousand  per  leak) . 
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d.  Therefore,  it  was  expected  that  a LOC/V  due  to  exhaust  gas 

leak  would  occur  approximately  once  in  ten  million  missions. 

Rather  than  produce  a detailed  quantification  for  such  a remote 
occurrence,  the  effort  was  simplified  and  the  frequency  of  all 
scenarios  associated  with  exhaust  duct  leaks  was  assessed  as  neg- 
ligible, even  though  a detailed  model  had  already  been  developed. 


6.5.4  Treatment  of  Dependencies  in  the  Split  Fraction  Models 

Prior  experience  shows  that  common  cause  failures  tend  to  be 
important  risk  contributors  because  multiple  failures  can  occur  as 
a result  of  a single  failure  condition  common  to  two  or  more  units. 
Usually  this  is  at  a substantially  higher  probability  than  that 
associated  with  multiple  independent  failures.  Hence,  it  was 
important  to  include  such  potential  contributors  wherever  they 
were  indicated  by  the  recorded  APU  and  HPU  failure  history 
databases . 

In  most  cases  the  fault  trees  are  intended  to  provide  prob- 
abilistic results  that  serve  directly  as  the  split  fractions 
for  their  associated  top  events.  In  some  cases,  however,  the 
fault  trees  provide  intermediate  results  that  must  be  combined 
with  other  models  to  obtain  the  required  top  event  split 
fractions.  For  example,  two  consecutive  top  events  in  the  event 
tree  in  Figure  6.4.1  are  labeled  PA  and  DA.  PA  represents  the 
event  in  which  one  or  more  APUs  have  a permanent  failure,  while 
DA  represents  the  event  in  which  at  least  two  APUs  fail  given 
that  at  least  one  has  failed.  The  fault  tree  for  PA  yields  the 
associated  split  fraction  directly.  However,  the  fault  tree 
for  DA  yields  the  probability  of  at  least  two  APU  failures.  To 
obtain  the  split  fraction  for  the  DA  event,  divide  the  DA  result 
by  the  PA  result,  thereby  giving  the  probability  of  two  or  more 
APU  failures  given  that  one  or  more  failures  are  known  to  have 
occurred.  This  type  of  analysis  also  applies  to  the  top  events 
HA  and  GA  in  that  same  event  tree. 


6.5.5  Treatment  of  Order  of  Occurrence  in  the  Models 

Event  trees  are  simply  logic  diagrams  that  indicate  what  specific 
combinations  of  events  occur  and  do  not  occur;  such  trees  do 
not  ordinarily  convey  any  information  as  to  the  order  in  which 
events  occur.  Thus,  the  fault  tree  models  have  to  be  carefully 
constructed  to  account  for  order,  when  order  is  of  concern.  For 
example,  in  the  Stage  A event  tree  shown  in  Figure  6.4.1,  there 
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are  top  events  labeled  TA  and  DA.  TA  accounts  for  the  potential 
for  a turbine  runaway,  and  DA  accounts  for  the  possibili  y o 
a second  independent  permanent  failure  of  an  APU.  Since  the  TA 
event  appears  first  in  the  event  tree,  the  fault  tree  for  k 
models  the  potential  for  a runaway  of  one  out  of  three  APUs. 

The  DA  event  must  then  consider  the  implications  of  the  order  in 
which  the  two  events  occur.  If  the  TA  event  occurs  first  (which 
is  taken  to  occur  with  a probability  of  50%).  then  the  TA  analysis 
k j ori  one  APU  failinq  out  of  ttiiree  is  correct,  and  the  DA 
tree  must  consider  the  potential  for  one  APU  to  fail  ou*  *”°dA^S 
(because  the  third  APU,  which  is  named  APU  3,  has  al£eady  y 

runaway) . However,  if  DA  occurs  first  (with  a probability  of  50%)  , 
then  the* DA  fault  tree  must  be  based  on  one  out  of  three  failing, 
and  the  TA  fault  tree  should  be  based  on  one  out  of  two.  Since  the 
TA  analysis  is  already  based  on  one  out  of  three,  a correction 
factor  must  be  included  in  the  DA  fault  tree  to  correct  from  the 
l-out-of-3  TA  analysis  to  the  proper  i-out-of-2  basis  n*^ed  TA 

in  this  case.  In  summary,  some  complexity  is  added  to  the  fault 
trees  to  accurately  account  for  the  order  in  which  top  events  in 
Se  event  tree  could  occur.  Such  correction  factors  will  be  found 
^oHn  a number  of  the  fault  trees,  and  the  "secondary”  fault 
trees  needed  to  cover  the  i-out-of-2  case  for  TA  (and  other  such 
top  events)  are  also  presented  below.  The  specific  TA/ DA  case 
mentioned  here  is  discussed  with  the  appropriate  fault  trees  below. 


6.5.6  Nomenclature 

A special  naming  convention  has  been  used  in  all  of  the  ^ult 
trees.  The  first  two  characters  of  the  event  names  are  the  same 
as  the  two  characters  in  the  top  event  for  which  the  fault  tree 
was  developed.  For  the  basic  events,  the  third  and 
characters  identify  the  type  of  component  being  modeled,  and  the 
fifth  character  identifies  the  particular  failure  mod. 
gates,  the  third,  fourth,  and  fifth  characters  identify  the  level 
of  the  gate  in  the  fault  tree  and  distinguish  gates  at  each  level. 
The  last  (sixth)  character  is  1,  2,  or  3 to  identify  the  specific 
APU  in  which  the  component  or  gate  resides.  If  the  last  character 
is  a 0,  then  it  identifies  a generic  component  or  gate  --  tnar  i , 
something  (such  as  a common  cause  failure)  not  associated  with  any 
specific  APU. 

To  simplify  the  general  appearance  of  the  fault  trees,  they  are 

shown  in  full  only  for  APU  1.  That  detailed  d®vel°P®®n^SApu° 
as  a transfer  with  a label  of  the  form  XY1.  The  oth er  two  APUs 
are  then  represented  as  transfers  in  with  labels  of  the  form  XY2 
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and  XY3 . In  those  subtrees,  all  gates  and  basic  events  in  the 
subtree  XY1  that  end  with  a 1 are  converted  to  a 2 or  a 3 for  the 
corresponding  subtrees  XY2  and  XY3 , respectively. 

The  paragraphs  that  follow  are  divided  into  two  main  parts  ~ one 
for  the  APU  Stage  A analysis,  and  one  for  the  APU  Stage  B analysis. 


6.5.7  Stage  A Analysis 

Top  Event  TA:  Turbine  Overspeed 

The  first  top  event  in  the  Stage  A Event  Tree  shown  in  Figure 
6.4.1  is  TA.  This  event  represents  a specific  type  of  APU 
permanent  failure  — namely,  one  involving  turbine  runaway, 
where  failures  cause  the  turbine  speed  to  increase  above  normal 
operating  levels  and  the  overspeed  protection  system  fails  to 
shut  the  turbine  down.  This  particular  failure  mode  has  been 
separated  from  all  of  the  other  permanent  failures  because  of 
the  high  potential  for  consequential  failure  of  other  APUs 
or  flight-critical  equipment  due  to  the  high-energy  shrapnel 
generated  by  the  overspeed. 

The  fault  trees  developed  for  TA  are  shown  in  Appendix  B6.5-1 
and. B6. 5-2.  The  first  fault  tree  (labeled  TA)  covers  the  model 
for  the  case  of  one  runaway  out  of  three  APUs,  while  the  second 
(labeled  TA-D)  models  the  case  of  one  runaway  out  of  two  APUs. 

The  second  fault  tree  is  provided  to  support  top  events  to  the 
right  in  the  event  tree  where  the  order  in  which  events  occur 
is  a consideration. 

Both  fault  trees  model  runaway  in  terms  of  having  both  the 
primary  and  secondary  control  valves  failing  open,  together  with 
failure  of  the  overspeed  protection  system  to  shut  down  the  APU 
and  prevent  the  runaway  condition.  The  numerical  result  computed 
from  fault  tree  TA  directly  yields  the  requisite  split  fraction 
for  the  top  event  TA  in  the  event  tree. 

Top  Event  PA:  Equipment  Failure  of  1 APU  After  it  Starts 

The  second  top  event  in  the  Stage  A Event  Tree  shown  in  Figure 
6.4-1  is  PA.  This  event  represents  all  but  two  contributors  to 
the  permanent  failure  of  at  least  one  of  the  three  APUs.  The 
two  exceptions  are  (1)  the  turbine  runaway  failures  covered  by 
TA,  and  (2)  the  start  failures,  which  are  more  conveniently 
analyzed  in  the  Top  Event  BA  (the  failures  occurring  before  lift- 
off and  contributing  to  launch  scrub) . 
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The  fault  trees  developed  for  PA  are  provided  m Appendices  B6.5- 
3 and  B6.5-4.  The  first  fault  tree  (labeled  PA)  models  the 
permanent  failure  of  at  least  one  out  of  three  APUs,  while  th 
second  one  (labeled  PA-T)  models  the  permanent  failure  of  at 
least  one  out  of  two  APUs.  This  second  fault  tree  is  provided 
to  support  top  events  to  the  right  of  event  PA  a the  event  tree 
where  the  order  in  which  events  occurs  is  a consideration. 

Both  PA  fault  trees  model  permanent  failures  in  terms  of  tne 
following  primary  failure  modes: 

a.  Fuel  line  blockage 

b.  Fuel  pump  failure 

c.  Low  fuel  tank  pressure 

d.  Turbine  fails  to  run 

e.  Turbine  wheel  shutdown  failure 

f.  Gearbox  fails  to  run 

g.  Gas  generator  run  failure 

h.  Fuel  tank  isolation  valves  fail  closed 

i.  Fuel  depleted  after  shutdown 

j.  Common  cause  failure  of  lube  oil  circulation  due  to 
contamination 

The  numerical  result  computed  from  Fault 

the  requisite  split  fraction  for  the  Top  Event  PA  in  the  event 
tree. 

Top  Event  DA:  Failure  of  a second  APU  After  it  Starts 

The  third  top  event  in  the  Stage  A Event  Tree  is  DA.  This  event 
represents  all  but  two  contributors  to  the  permanent  failure  of 
at  least  two  of  the  three  APUs,  where  the  two  exceptions  are  (i) 
the  turbine  runaway  failures  covered  by  TA,  and  <2>  * ^ent 

failures,  which  are  more  conveniently  analyzed  in  th  P 
BA  (the  failures  occurring  before  lift  off  and  contributing 
launch  scrub) . The  only  difference  between  this  event  and  the 
event  PA  is  that  DA  accounts  for  at  least  two  °ut  ^e* 
failing,  while  PA  accounts  for  at  least  one  out  of  three  AP 

failing. 

In  the  event  tree,  the  PA  event  represents  the  probability  of  an 
independent  permanent  failure  occurring  m at  least  one  APU, 
the  DA  event  represents  the  probability  of  an  mdepen  en 
permanent  failure  occurring  in  at  least  two  APUs  given  that  at 
least  one  is  known  to  have  occurred.  The  scenario  in  wh*c*V  . 
occurs  and  DA  does  not  occur  represents  the  case  in  which  exact  y 
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one  APU  has  an  independent  permanent  failure.  The  scenario  in 
which  both  PA  and  DA  occur  represents  the  case  in  which  two  or 
more  APUs  have  independent  permanent  failures.  When  the  TA  event 
occurs  in  the  event  tree,  only  the  DA  event  is  questioned  with 
regard  to  the  occurrence  of  a second  permanent  failure  as  a result 
of  an  independent  cause.  In  this  case,  it  is  not  addressed  via 
Event  PA.  This  is  simply  an  analysis  convention  that  was  adopted 
for  convenience;  this  situation  could  just  as  well  have  been 
addressed  by  using  PA. 

The  fault  trees  developed  for  DA  are  shown  in  Appendices  B6.5-5 
through  B6.5-7.  Appendix  B6.5-5  is  the  fault  tree  DAI  that 
applies  to  the  first  (uppermost)  node  for  DA  in  the  event  tree 
and  models  the  permanent  failure  of  at  least  two  out  of  three 
APUs.  Appendix  B6.5-6  is  the  Fault  Tree  DA2  that  applies  to  the 
second  (lower)  node  for  DA  in  the  event  tree.  This  models 
the  second  permanent  failure  that  occurs  in  conjunction  with  the 
turbine  runaway  failure  modeled  by  the  TA  event,  and  the  fault 
tree  is  in  the  same  basic  form  as  the  PA  Fault  Tree.  The  Fault 
Tree  DAT  in  Appendix  B6.5-7  models  the  case  of  two  permanent 
failures  out  of  two  APUs,  which  is  provided  to  support  top  events 
to  the  right  of  event  DA  in  the  event  tree  where  the  order  in 
which  events  occur  is  a consideration. 

The  fault  tree  for  DA2  in  Appendix  B6.5-6  is  the  first  illustra- 
tion of  the  logic  required  to  account  for  the  order  in  which 
events  occur,  as  discussed  in  Section  6.5.1.  If  event  TA  occurs 
first,  then  the  TA  l-out-of-3  fault  tree  model  is  correct,  and 
the  DA  logic  must  consider  l-out-of-2  failure  logic.  This 
situation  is  shown  on  the  right  side  of  the  diagram  in  Appendix 
B6.5-6.  If,  on  the  other  hand,  DA  occurs  first,  then  the  TA 
l-out-of-3  logic  must  be  corrected  to  l-out-of-2  logic,  and  the 
correct  logic  for  DA  is  l-out-of-3 . This  situation  is  shown  on 
the  left  side  of  that  diagram.  The  correction  factor  represented 
by  the  basic  event  DATCFO  is  the  ratio  of  the  result  from  the 
TA-D  tree  to  that  from  the  TA  tree. 

All  of  the  fault  trees  needed  for  the  DA  event  model  permanent 
failures  in  terms  of  the  following  primary  failure  modes: 

a.  Fuel  line  blockage 

b.  Fuel  pump  failure 

c.  Low  fuel  tank  pressure 

d.  Turbine  fails  to  run 

e.  Turbine  wheel  shutdown  failure 

f.  Gearbox  fails  to  run 

g.  Gas  generator  run  failure 
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h.  Fuel  tank  isolation  valves  fail  closed 

i.  Fuel  depleted  after  shutdown  through  a gearbox  shaft  seal 

j . common  cause  failure  of  lube  oil  circulation  due  to 
contamination 

The  numerical  result  from  Fault  Tree  DAI  in  Appendix  B6.5-5  must 
be  divided  by  the  numerical  result  from  Fault  Tree  PA  to  obtain 
the  split  fraction  needed  for  node  1 for  the  event  DA*  this  split 
fraction  is  the  conditional  probability  of  two  or  more  permanent 
failures  given  that  one  or  more  permanent  failures  have  occurred. 
The  numerical  result  computed  from  Fault  Tree  DA2  in  Appendix 
B6.5-6  directly  yields  the  requisite  split  fraction  for  node  2 
of  Top  Event  DA  in  the  event  tree. 

Top  Event  CA:  Failure  of  a second  APU  or  Plight  Critical 

Equipment  Due  to  Failure  of  the  First  APU 

The  fourth  top  event  in  the  Stage  A Event  Tree  is  CA.  This  event 
represents  the  consequential  permanent  failure  of  flight  critical 
equipment  or  of  at  least  one  APU  following  the  permanent  failure 
of  one  other  APU. 

The  CA  fault  tree  is  shown  in  Appendices  B6.5-8  and  B6.5-9. 
Appendix  B6.5-8  is  the  Fault  Tree  CA1  that  applies  to  the  first 
(uppermost)  node  for  CA  in  the  event  tree  and  models  the  con- 
sequential failure  of  flight  critical  equipment  or  of  at  least 
one  other  APU  following  the  nonrunaway  permanent  failure  of  one 
APU  (from  Event  PA).  Appendix  B6.5-9  is  the  Fault  Tree  CA2 
that  applies  to  the  second  (lower)  node  for  CA  in  the  event 
tree.  This  models  the  consequential  permanent  failure  of  flight 
critical  equipment  or  of  at  least  one  other  APU  following  a 
turbine  runaway  failure  (from  Event  TA) . Separate  fault  trees 
are  required  because  the  potential  for  consequential  failure 
following  a turbine  runaway  is  higher  than  that  for  other  forms 
of  permanent  failure.  The  numerical  results  computed  from  both 
Fault  Trees  CA1  and  CA2  directly  yield  the  requisite  split 
fractions  for  nodes  1 and  2 of  Top  Event  CA  in  the  event  tree. 

Top  Event  HA:  Failure  of  One  APU  Due  to  Exhaust  Gas  Leak 

The  fifth  top  event  in  the  Stage  A Event  Tree  is  HA.  This  event 
represents  the  failure  of  at  least  one  APU  as  a consequence  of  an 
exhaust  gas  leak  in  at  least  one  APU.  The  model  is  based  on  the 
realization  that  the  potential  for  a non-leaking  APU  to  fail  is 
extremely  remote.  Thus,  the  model  only  accounts  for  failures 
of  APUs  that  are  themselves  experiencing  hot  gas  leaks.  This  is 
also  a very  low  frequency,  as  described  earlier. 
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The  fault  trees  developed  for  HA  are  shown  in  Appendices  B6.5-10 
and  B6.5-11.  The  first  fault  tree,  HA1,  models  the  permanent  fail- 
ure of  at  least  one  out  of  three  APUs  as  a consequence  of  exhaust 
gas  leaks,  while  the  second,  labeled  HAT,  models  the  permanent 
failure  of  at  least  one  out  of  two  APUs  as  a consequence  of  exhaust 
gas  leaks.  This  second  fault  tree  is  provided  to  support  top 
events  to  the  right  of  event  HA  in  the  event  tree  where  the  order 
in  which  events  occurs  is  a consideration. 

The  numerical  result  computed  from  Fault  Tree  HA1  directly  yields 
the  requisite  split  fraction  for  the  Top  Event  HA  in  the  event 
tree. 

Top  Event  GA:  Failure  of  a Second  APU  or  Flight-Critical 

Equipment  Due  to  Exhaust  Gas  Leak 

The  sixth  top  event  in  the  Stage  A Event  Tree  is  GA.  This  event 
represents  the  failure  of  at  least  two  APUs  as  a consequence  of 
exhaust  gas  leaks  in  at  least  two  APUs,  given  that  at  least  one 
APU  is  known  to  have  failed  as  a consequence  of  a hot  gas  leak. 

The  model  is  based  on  the  realization  that  the  potential  for  a 
non-leaking  APU  to  fail  is  extremely  remote.  Thus,  the  model  only 
accounts  for  failures  of  APUs  that  are  themselves  experiencing 
hot  gas  leaks. 

The  fault  trees  developed  for  GA  are  shown  in  Appendices  B6.5-12 
through  B6.5-16.  Appendices  B6.5-12  through  B6.5-15  show  four 
different  fault  trees.  The  numerical  results  computed  from  the 
four  fault  trees  are  used  in  the  same  manner,  as  described  above, 
for  event  DA  to  provide  the  requisite  split  fractions  for  the 
four  nodes  of  Top  Event  GA  in  the  event  tree.  The  Fault  Tree  GAT 
shown  in  Appendix  B6.5-16  is  used  in  the  same  manner  as  described 
above  for  Fault  Tree  DAT  for  event  DA. 

Top  Events  LI,  L2,  L3:  Leakage  of  Hydrazine  From  APU  l,  2,  or  3 

The  seventh,  eighth,  and  ninth  top  events  in  the  Stage  A Event 
Tree  shown  in  Figure  6. 4. A are  Lk,  where  k can  be  1,  2,  or  3. 

This  event  represents  the  independent  occurrence  of  a fuel  leak 
in  APU  k.  Rather  than  consider  the  logic  for  these  three  top 
events  in  terms  of  a fault  tree  or  a set  of  three  fault  trees, 
it  was  much  simpler  to  express  the  logic  in  terms  of  a simple 
event  tree  as  a means  of  representing  the  probability  values 
needed  for  the  various  combinations  of  leakage  occurrences. 

Event  Tree  LK  is  shown  in  Appendix  B6.5-17.  The  split  fraction 
to  be  used  for  each  node  for  each  top  event  is  shown  at  that 
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node.  Lambda  represents  the  failure  rate  with  which  independent 
leakage  occurs,  and  "t"  is  the  time  interval  of  interest  over 
which  the  leak  can  occur.  Beta  represents  a common  cause  factor, 
which  is  a measure  of  the  conditional  probability  that  a second 
APU  has  a fuel  leak  given  that  one  is  already  known  to  be 
leaking.  Lambda  and  beta  can  both  be  derived  from  the  Shuttle 
flight  history  data,  as  discussed  in  Section  7.0. 

An  important  characteristic  of  the  split  fraction  formulas  given 
for  the  various  nodes  in  Appendix  B6.5-17  is  that  the  scenario 
probabilities  shown  for  all  scenarios  involving  exactly  one 
leaking  leaking  APU  are  all  identical.  The  same  is  true  for  the 
scenarios  with  exactly  two  leaking  APUs.  Also,  the  sum  of  the 
probabilities  for  all  eight  scenarios  is  exactly  one. 

Using  the  leakage  split  fractions  listed  is  simply  a matter  of 
matching  the  nodes  in  that  figure  with  the  corresponding  nodes 
in  the  event  tree  in  Figure  6.4.1.  That  is,  the  split  fraction 
P2i  for  node  1 of  the  event  L2  is  matched  to  all  nodes  in  the 
event  tree  for  which  L2  occurs  when  LI  does  not  occur.  Likewise, 
the  split  fraction  P22  for  node  2 of  the  event  L2  is  matched  to 
all  nodes  in  the  event  tree  for  which  LI  does  occur.  A similar 
approach  is  used  for  the  nodes  for  L3. 

Top  Event  FA:  Failure  of  Flight-Critical  Equipment  Due  to 

Hydrazine  Leakage 


The  tenth  top  event  in  the  Stage  A Event  Tree  is  FA.  This  event 
represents  the  permanent  failure  of  flight  critical  equipment 
as  a direct  consequence  of  a fuel  leak  in  one  or  more  APUs.  No 
fault  tree  was  constructed  for  this  event  since  the  requisite 
split  fraction  is  simply  one  number  that  depends  only  on  the 
specific  leakage  conditions  for  the  scenario  being  analyzed. 

The  development  of  those  single  split  fractions  is  discussed  in 
Section  7.0. 

Top  Events  ci,  C2 , C3:  Failure  of  1 APU  Due  to  Hydrazine  Leakage 

The  eleventh,  twelfth,  and  thirteenth  top  events  in  the  Stage  A 
Event  Tree  are  Ck,  where  k can  be  1,  2,  or  3.  This  event  repre- 
sents the  consequential  failure  of  APU  k due  to  a fuel  leak  in 
one  of  the  APUs  (the  leak  can  be  in  APU  k,  in  some  other  APU,  or 
in  some  combination  of  both  — the  specific  condition  depending 
entirely  on  the  particular  event  tree  scenario  being  analyzed) . 

A generic  fault  tree  applicable  to  all  of  the  Ck  event  tree  nodes 
is  presented  in  Appendix  B6.5-18.  The  numerical  result  computed 
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from  Fault  Tree  Ck  directly  yields  the  requisite  split  fraction 
for  the  appropriate  nodes  of  Top  Event  Ck  in  the  event  tree. 

Top  Events  Si,  82,  S3:  Spurious  Shutdown  of  APU  l,  2,  or  3 

The  fourteenth,  fifteenth,  and  sixteenth  top  events  in  the  Stage  A 
Event  Tree  are  Sk,  where  k can  be  1,  2,  or  3.  This  event  repre- 
sents a specific  type  of  APU  recoverable  failure  — namely,  one 
involving  a spurious  overspeed  or  underspeed  trip  of  the  turbine 
in  APU  k.  This  condition  causes  an  immediate,  automatic  shutdown 
of  the  affected  APU,  but  that  APU  can  be  recovered  during  stage  B 
by  setting  the  associated  automatic  over/underspeed  control 
switch  to  the  inhibit  position.  This  particular  failure  mode  has 
been  separated  from  all  of  the  other  recoverable  failures  because 
of  the  immediate,  automatic  loss  of  the  affected  APU  (recoverable 
failures  from  fuel  leakage  do  not  result  in  immediate,  automatic 
shutdown  of  the  affected  APU) . 

The  generic  fault  tree  developed  for  Sk  is  shown  in  Appendix 
B6.5-19.  This  diagram,  like  others  described  previously,  takes 
event  occurrence  order  into  account  in  those  scenarios  in  which 
some  other  failure  is  identified  as  occurring  in  conjunction  with 
the  spurious  overspeed  or  underspeed  trip.  If  the  other  failure 
occurred  first  (with  50%  probability) , then  the  occurrence  of  the 
spurious  trip  requires  a failure  of  the  inhibit  circuitry.  If  the 
spurious  trip  is  first,  then  the  inhibit  circuitry  is  considered 
not  to  have  been  activated.  The  basic  event  DARATO  provides  the 
necessary  factor  for  correcting  the  probability  obtained  from  the 
other  event  in  the  event  tree,  in  the  same  manner  as  described 
previously.  The  numerical  result  computed  from  Fault  Tree  Sk 
directly  yields  the  requisite  split  fraction  for  the  Top  Event  Sk 
in  the  event  tree. 

Top  Event  BA:  Failure  of  One  or  Two  APUs  Before  Launch 

The  seventeenth  top  event  in  the  Stage  A Event  Tree  is  BA.  This 
event  represents  a correction  factor  to  distinguish  between 
failures  occurring  before  and  after  lift-off.  The  prior  events 
in  the  event  tree  account  for  all  run  failures,  regardless  of  the 
time  at  which  they  occur  while  the  APUs  are  running.  Failures 
occurring  before  lift-off  ordinarily  result  in  launch  scrub, 
while  failures  occurring  afterward  can  result  in  a variety  of 
damage  states,  depending  on  their  severity. 

The  fault  tree  developed  for  BA  is  presented  in  Appendices  B6.5- 
20  and  B6.5-21.  Two  trees  are  shown:  one  (labeled  BAO  in  Appendix 
B6.5-20)  that  applies  only  to  the  first  node  for  the  BA  event  in 
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the  event  tree  and  the  other  (labeled  BAn  in  Appendix  B6.5-21) 
that  applies  to  all  other  nodes.  The  BAO  fault  tree  accounts  for 
all  start  failures  which  are  not  otherwise  taken  into  account  m 
the  fault  trees  developed  for  all  other  top  events  in  the  event 
tree.  Start  failures,  of  course,  all  occur  before  lift-off  and 
are  therefore,  prelaunch  failures  that  ordinarily  lead  to  launch 
scrub.  Such  failures  are  not  considered  elsewhere  m the  event 
tree  logic.  The  BAn  fault  tree  accounts  for  the  start  failures 
and  the  proportion  of  run  time  that  constitutes  the  pre-lift-off 
period.  This  is  a simple  time  ratio— the  ratio  of  pre-launch  run 
time  to  the  total  Stage  A run  time.  The  pre-launch  run  time  is  5 
minutes,  while  the  post-launch  Stage  A run  time  is  13  minutes, 
yielding  a ratio  of  R * 5/18  for  scenarios  m which  one  APU  has 
failed.  The  ratio  becomes  2R  - R2  for  scenarios  in  which  two 
APUs  have  failed.  The  numerical  result  computed  from  Fault  Tree 
BA  directly  yields  the  requisite  split  fraction  for  Top  Event  BA 
in  the  event  tree. 


Top  Events  EA,  MA: 


Failure  occurs  in  Thrust  Bucket/  and  Failure 
fteeurs  ME CO 


The  eighteenth  and  nineteenth  top  events  in  the  event  tree  are 
EA  and  MA.  These  events  identify  failures  that  occur  m the 
thrust  bucket  (EA)  and  post  MECO  (MA) . These  are,  like  the  event 
BA,  simply  time  ratios.  The  event  EA  is  the  ratio  of  time  in  the 
thrust  bucket  to  the  total  Stage  A run  time.  The  time  m the 
thrust  bucket  is  about  0.5  minutes,  and  the  total  Stage  A run 
time  is  18  minutes.  This  gives  a ratio  of  0.5/18,  or  0.028,  for 
the  numerical  value  of  the  split  fraction  for  the  event  EA.  The 
run  time  following  MECO  is  approximately  5 minutes,  which  gives 
a ratio  of  5/18,  or  0.28,  for  the  numerical  value  of  the  split 
fraction  for  event  MA. 

Top  Event  IA:  Intact  Abort  Called  by  MCC 

The  last  top  event  in  the  event  tree  is  IA.  This  event  identifies 
failures  that,  in  the  judgment  of  ground  personnel  and  the  flight 
crew,  cannot  support  landing  at  the  first  PLS,  thereby  resulting 
in  an  intact  abort.  This  is  a judgment  call  made  by  MCC  at  the 
time  that  the  failure  occurs.  It  is  beyond  the  scope  of  this 
study  to  evaluate  in-flight  decisions  made  by  MCC.  Therefore, 
a conservative  (50  - 50)  chance  that  this  event  would  lead  to  an 
intact  abort  was  assigned.  Although  this  may  be  conservative,  it 
does  not  significantly  affect  the  overall  frequency  of  intact 
aborts,  which  is  dominated  by  failure  in  the  thrust  bucket. 
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6.5.8  Stage  B Analysis 


As  discussed  in  Section  6.4.1,  the  analysis  of  the  stage  A Event 
Tree  leads  to  quite  a few  damage  vectors.  However,  these  damage 
vectors  were  combined  into  four  damage  bins  that  form  the  initial 
conditions  for  the  analysis  of  Stage  B.  These  four  initial 
conditions  for  the  Stage  B analysis  are  defined  in  Table  6.4.1. 

The  Stage  B Event  Trees  were  developed  for  deunage  bins  4 and  7, 
as  discussed  in  Section  6.4.  Each  damage  bin  constitutes  the 
initial  condition  for  a Stage  B quantification.  Each  event  tree 
has  potentially  different  split  fraction  models  that  form  the 
basis  for  quantifying  that  event  tree. 

Before  discussing  the  individual  initial  conditions,  it  is 
appropriate  to  discuss  certain  considerations  that  apply  to  all 
of  the  initial  conditions.  In  many  cases,  the  fault  trees 
needed  for  the  Stage  B analyses  are  the  same  or  very  nearly  the 
same  as  the  corresponding  fault  trees  for  Stage  A.  In  all  such 
cases,  the  primary  emphasis  in  the  discussions  that  follow  is 
to  identify  the  differences  between  the  fault  trees  for  those 
two  stages.  The  recovery  Event  (RE)  at  the  end  of  the  Stage  B 
Event  Trees  refers  to  recovery  from  failures  that  occur  during 
Stage  B;  recovery  from  Stage  A failures  is  taken  into  account 
in  the  fault  trees  in  a manner  consistent  with  the  flight  rules. 
In  Stage  A,  start  failures  were  included  in  the  Event  BA  as  a 
basis  for  identifying  launch  scrub  conditions.  In  Stage  B,  all 
start  failures  are  taken  into  account  at  the  beginning  of  the 
event  trees,  in  Top  Events  SS  and  DS.  Start  and  run  failures 
were  separated  so  that  the  time  ratios  used  in  events  like  TE 
and  PW  could  be  applied  solely  to  probabilities  that  are  time- 
based,  with  no  demand  failures  involved.  Since  the  PB  and  DB 
Events  account  for  all  run  failures  for  the  full  duration  of 
Stage  B,  there  is  no  need  to  include  run  failure  considerations 
in  the  fault  trees  for  the  RE  Event;  only  failures  to  restart 
on  demand  (if  required)  need  be  considered  in  RE.  For  initial 
conditions  other  than  Bin  7 (also  called  Impact  Vector  1) , the 
Stage  B analysis  is  begun  with  at  least  one  APU  failure  (either 
permanent  or  recoverable)  having  occurred  during  Stage  A.  Under 
such  circumstances , the  over/underspeed  auto  trip  switch  would 
have  been  set  to  inhibit  automatic  shutdown.  This  means  that 
spurious  conditions  which  would  otherwise  cause  an  automatic 
shutdown  of  the  affected  APU  (such  as  MPU  1 failing  high  or  low) 
would  be  inhibited  and,  thereby,  prevent  shutdown  from  occurring. 
However,  if  there  is  a failure  of  the  inhibit  circuitry,  then 
such  a condition  can  still  cause  a spurious  shutdown.  In  this 
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case,  such  a shutdown  is  a permanent  failure  because  there  is 
no  way  to  inhibit  the  faulty  signal.  Such  contributions  to 
permanent  failure  have  been  included  in  the  SB  Fault  Tree,  one 
final  comment  about  the  analysis  of  Impact  Vector  1.  For  all 
scenarios  involving  Mk  (fuel  leak)  and  Tk  (spurious  automatic 
shutdown) , the  analysis  has  been  simplified  by  conservatively 
assuming  that  the  leak  occurs  first,  which  means  that  the 
spurious  trip  is  a permanent  failure. 


6. 5. 8.1  Initial  Condition  7 (Impact  Vector  1) 

Based  on  the  discussion  in  Section  6.4,  initial  condition  7 (from 
Damage  Bin  7)  is  defined  as  follows: 

All  three  APUs  successful  at  the  end  of 
Stage  A 

The  fault  trees  developed  to  support  the  associated  event  tree  in 
Appendix  B6.4-2  for  this  initial  condition  are  discussed  below. 
This  initial  condition  is  referred  to  in  the  fault  tree  diagrams 
as  Impact  Vector  1. 

Top  Event  88:  One  APU  Fails  to  Start 

The  first  top  event  in  the  Stage  B Event  Tree  is  SS.  This  event 
represents  a specific  type  of  APU  permanent  failure  — namely, 
failure  of  one  or  more  APUs  to  start  on  demand.  This  particular 
failure  mode  had  to  be  separated  from  the  run  failures  covered 
by  Top  Events  PB  and  DB  so  that  the  time  ratios  used  in  Top 
Events  TE  and  PW  would  be  applied  only  to  run-time  failures  and 
not  to  a combination  of  run-time  and  demand  failures. 

The  fault  tree  developed  for  SS  is  shown  in  Appendix  B6.5-22. 

This  diagram  is  essentially  the  same  as  the  one  developed  for 
the  start  failures  in  the  Top  Event  BA  for  Stage  A,  with  a few 
exceptions  as  described  below.  In  Stage  A,  any  kind  of  failure 
of  the  primary  valve  was  considered  grounds  for  scrubbing  the 
mission,  including  cases  in  which  the  primary  valve  fails  open. 
In  Stage  B,  however,  such  conditions  (the  valve  failing  open) 
would  not  cause  start  failure  because  the  secondary  valve  would 
begin  cycling  and  take  over  control  of  fuel  flow. 

The  other  change  was  to  remove  the  basic  event  in  which  the 
secondary  valve  leaks  before  APU  startup.  If  that  happens,  fuel 
leaks  into  the  gas  generator  and  causes  it  to  heat  up.  In  that 
case,  the  APU  is  not  started  because  of  the  danger  of  fuel 
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detonation.  In  Stage  A,  that  leads  to  launch  scrub.  In  Stage  B 
that  simply  delays  startup  of  the  APU  until  the  injector  spray 
system  can  cool  the  temperature  down  to  an  acceptable  level. 

Thus,  this  would  not  be  a failure  unless  another. APU  fails  and 
either  the  injector  spray  cooling  system  fails  or  the  APU  fails 
to  start  for  some  other  reason.  This  third-order  failure  scenario 
was  judged  to  be  of  too  low  a probability  to  be  of  any  practical 
concern  and  was  removed  from  the  analysis. 

The  numerical  result  computed  from  the  Fault  Tree  SS  directly 
yields  the  requisite  split  fraction  for  the  Top  Event  SS  in  the 
event  tree. 

Top  Event  D8:  Second  APU  Fails  to  Start 

The  second  top  event  in  the  Stage  B Event  Tree  is  DS.  This  event 
represents  a specific  type  of  APU  permanent  failure  — namely, 
failure  of  two  or  more  APUs  to  start  on  demand.  This  particular 
top  event  is  used  in  conjunction  with  the  top  Event  SS  to  be  able 
to  distinguish  between  cases  in  which  only  one  start  failure 
occurs  versus  cases  in  which  two  or  more  failures  occur. 

The  fault  tree  developed  for  DS  is  shown  in  Appendix  B6.5-23. 

This  diagram  is  essentially  the  same  as  the  one  developed  for 
Top  Event  SS  except  that  the  simple  OR  gate  for  the  top  event 
has  been  changed  to  a 2-out-of-3  gate.  All  other  aspects  of 
the  fault  tree  are  exactly  the  same. 

The  numerical  result  computed  from  the  Fault  Tree  DS  must  be 
divided  by  the  numerical  result  from  Fault  Tree  SS  to  obtain  the 
split  fraction  needed  for  the  Top  Event  DS;  this  split  fraction 
is  the  conditional  probability  of  two  or  more  start  failures, 
given  that  one  or  more  start  failures  are  known  to  have  occurred. 

Top  Event  TB:  Turbine  Overspeed 

The  third  top  event  in  the  Stage  B Event  Tree  is  TB.  This  event 
represents  a specific  type  of  APU  permanent  failure — namely,  one 
involving  turbine  runaway,,  where  failures  cause  the  turbine  speed 
to  increase  above  normal  operating  levels  and  the  overspeed  pro- 
tective system  fails  to  shut  the  turbine  down.  This  particular 
failure  mode  has  been  separated  from  all  of  the  other  permanent 
failures  because  of  the  high  potential  for  consequential  failure 
of  flight-critical  equipment  or  other  APUs  due  to  the  high-energy 
shrapnel  and  subsequent  hydrazine  release  generated  by  the  over- 
speed . 
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The  fault  trees  developed  for  TB  are  presented  in  Appendices  B6.5- 
24  through  B6.5-26.  Appendix  B6.5-24  presents  the  fault  tree 
(labeled  TB1)  that  applies  to  the  first  (uppermost)  node  for  TB  in 
the  event  tree  and  models  a turbine  runaway  failure  of  at  least 
one  out  cf  three  APUs.  Appendix  B6.5-25  presents  the  fault  tree 
(labeled  TB2)  that  applies  to  the  second  (lower)  node  for  TB  in 
the  event  tree.  This  models  a turbine  runaway  failure  that 
occurs  after  an  APU  start  failure  (Event  SS) . The  fault  tree  m 
Appendix  B6.5-26  models  the  case  of  one  turbine  runaway  out  of 
two  APUs,  which  is  provided  to  support  top  events  to  the  right  Oj. 
Event  TB  in  the  event  tree  where  the  order  in  which  events  occur 
is  a consideration. 

The  fault  trees  in  Appendices  B6.5-24  through  B6.5-26  are  iden- 
tical to  the  corresponding  fault  trees  developed  for  Stage  A. 

The  fault  tree  in  Appendix  B6.5-25  is  identical  to  that  in 
Appendix  B6.5-26.  The  numerical  results  computed  from  Fault 
Tree  TE  directly  yield  the  requisite  split  fractions  for  the 
two  nodes  of  Top  Event  TB  in  the  event  tree. 

Top  Event  PB:  Equipment  Failure  of  One  APU  After  Start 

The  fourth  top  event  in  the  Stage  B Event  Tree  is  PB.  This  event 
represents  all  but  four  contributors  to  the  failure  of  at  least 
one  of  the  three  APUs,  where  the  four  exceptions  are:  (1)  the 

turbine  runaway  failures  covered  by  TB,  (2)  the  start  failures, 
which  are  analyzed  in  Top  Event  SS,  (3)  leakage  events,  and  (4) 
spurious  shutdowns. 

The  fault  trees  developed  for  PB  are  presented  in  Appendices  B6.5- 
27  and  B6.5-28.  The  first  fault  tree  (labeled  PB)  models  the 
permanent  failure  of  at  least  one  out  of  three  APUs,  while  the 
second  one  (labeled  PB-T)  models  the  permanent  failure  of  at 
least  one  out  of  two  APUs.  This  second  fault  tree  is  provided 
to  support  top  events  to  the  right  of  Event  PB  in  the  event  tree 
where  the  order  in  which  events  occurs  is  a consideration. 

The  fault  trees  developed  for  the  PB  Top  Event  are  essentially  the 
same  as  those  developed  for  Stage  A.  The  major  exception  to  this  is 
the  portion  added  to  account  for  failures  occurring  during  the  on- 
orbit  portion  of  the  mission.  These  failures  include  heaters  that 
fail  on  and  heaters  that  fail  off.  The  fault  tree  also  includes 
fuel  and  lube  oil  leaks  and  the  inadvertent  hot  restart  of  an  APU. 

The  numerical  result  computed  from  Fault  Tree  PB  directly  yields 
the  requisite  split  fraction  for  the  Top  Event  PB  in  the  event 
tree. 
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Top  Event  DB:  Failure  of  Second  APU  After  Start 

The  fifth  top  event  in  the  stage  B Event  Tree  is  DB.  This  event 
represents  all  but  four  contributors  to  the  permanent  failure  of 
at  least  two  of  the  three  APUs,  where  the  four  exceptions  are  as 
identified  for  Top  Event  PB.  The  only  difference  between  this 
event  and  the  Event  PB  is  that  DB  accounts  for  at  least  two  out 
of  three  APUs  failing,  while  PB  accounts  for  at  least  one  out  of 
three  APUs  failing. 

In  the  event  tree,  the  PB  Event  represents  the  probability  of 
an  independent  permanent  failure  occurring  in  at  least  one  APU, 
and  the  DB  Event  represents  the  probability  of  an  independent 
permanent  failure  occurring  in  at  least  two  APUs,  given  that  at 
least  one  is  known  to  have  occurred.  The  scenario  in  which  PB 
occurs  and  DB  does  not  occur  represents  the  case  in  which  exactly 
one  APU  has  an  independent  permanent  failure.  The  scenario  in 
which  both  PB  and  DB  occur  represents  the  case  in  which  two  or 
more  APUs  have  independent  permanent  failures.  When  either  the 
SS  or  the  TB  Event  occurs  in  the  event  tree,  only  the  DB  Event  is 
questioned  with  regard  to  the  occurrence  of  a second  permanent 
failure  as  a result  of  an  independent  cause;  that  is,  this  case 
is  not  addressed  via  Event  PB.  This  is  simply  an  analysis 
convention  that  was  adopted  for  convenience;  this  situation  could 
just  as  well  have  been  addressed  by  using  PB. 

In  the  above  paragraph,  the  term  " independent M refers  to 
independence  with  respect  to  other  top  events  in  the  event  tree, 
^^at  is , it  is  not  intended  to  preclude  the  potential  occurrence 
of  common  cause  failures  within  the  context  of  the  PB  and  DB 
analyses  themselves.  It  simply  means  that  the  PB  and  DB  permanent 
failures  have  been  modeled  such  that  they  are  independent  of 
other  top  events. 

The  fault  trees  developed  for  DB  are  shown  in  Appendices  B6.5-29 
through  B6.5-32.  Appendix  B6.5-29  presents  a fault  tree  (labeled 
DB1)  that  applies  to  the  first  (uppermost)  node  for  DB  in  the 
event  tree  and  models  the  permanent  failure  of  at  least  two  out 
three  APUs.  Appendix  B6.5-30  presents  a fault  tree  (labeled 
DB2 ) that  applies  to  the  second  (middle)  node  for  DB  in  the  event 
tree.  This  models  the  second  permanent  failure  that  occurs  in 
conjunction  with  the  turbine  runaway  failure  modeled  by  the  TB 
Event,  and  the  fault  tree  is  in  the  same  basic  form  as  the  PB 
Fault  Tree.  Appendix  B6.5-31  presents  a fault  tree  (labeled  DB3) 
that  applies  to  the  third  (bottom)  node  for  DB  in  the  event  tree 
and  models  the  second  permanent  failure  that  occurs  following  a 
start  failure  in  another  APU.  This  fault  tree  is  very  similar 
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to  that  shown  in  Appendix  B6.5-30,  except  that  the  other  failure 
(failure  to  start)  is  definitely  known  to  have  occurred  first, 
and  this  knowledge  simplifies  the  model.  The  fault  tree  in 
Appendix  B6.5-32  models  the  case  of  two  permanent  failures  out  of 
two  APUs,  which  is  provided  to  support  top  events  to  the  right  of 
Event  DB  in  the  event  tree  where  the  order  in  which  events  occur 
is  a consideration. 

The  fault  tree  for  DB2  is  another  illustration  of  the  logic 
required  to  account  for  the  order  in  which  events  occur,  as 
discussed  in  Section  6.5.1.  If  Event  TB  occurs  first,  then  the 
TB  l-out-of-3  fault  tree  model  is  correct,  and  the  DB  logic 
must  consider  l-out-of-2  failure  logic.  This  situation  is  shown 
on  the  right  side  of  the  diagram  in  Appendix  B6.5-25.  If,  on  the 
other  hand,  DB  occurs  first,  then  the  TB  l-out-of-3  logic  must 
be  corrected  to  l-out-of-2  logic,  and  the  correct  logic  for  DB 
is  l-out-of-3.  This  situation  is  shown  on  the  left  side  of  that 
figure.  The  correction  factor  represented  by  the  Basic  Event 
DBTCFO  is  the  ratio  of  the  result  from  the  TB-D  Tree  to  that 
from  the  TB  Tree. 

The  fault  trees  developed  for  the  DB  Top  Event  are  essentially 
the  same  as  those  developed  for  Stage  A.  The  only  exception  to 
this  is  the  adaptation  needed  to  address  the  added  node  for  the 
case  in  which  a start  failure  (via  Top  Event  SS)  occurs  first, 
and  this  fault  tree  is  very  similar  to  the  DB2  Fault  Tree.  Since 
the  DB  Fault  Tree  depends  on  the  subtrees  for  each  separate  APU 
in  the  PB ‘Event,  it  follows  that  the  DB  Event  also  automatically 
includes  the  on-orbit  additions  described  above  for  the  PB  Event. 

The  numerical  result  from  Fault  Tree  DB1  must  be  divided  by  the 
numerical  result  from  Fault  Tree  PB  to  obtain  the  split  fraction 
needed  for  Node  1 for  the  Event  DB;  this  split  fraction  is  the 
conditional  probability  of  two  or  more  permanent  failures  given 
that  one  or  more  permanent  failures  are  known  to  have  occurred. 
The  numerical  result  computed  from  Fault  Trees  DB2  and  DB3 
directly  yield  the  requisite  split  fractions  for  Nodes  2 and  3 
of  Top  Event  DB  in  the  event  tree. 

Top  Event  CB:  Failure  of  the  Second  APU  or  Failure  of  Flight 

Critical  Equipment  Initiated  By  Failure  of  the 
First  APU 

The  sixth  top  event  in  the  Stage  B Event  Tree  is  CB.  This  event 
represents  the  consequential  permanent  failure  of  flight  critical 
equipment  or  at  least  one  APU  following  the  permanent  failure  of 
one  other  APU. 
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The  CB  Fault  Tree  is.  presented  in  Appendices  B6.5-3  3 through  B6.5 
35.  Appendix  B6.5-33  presents  a fault  tree  (labeled  CB1)  that 
applies  to  the  first  (uppermost)  node  for  CB  in  the  event  tree 
and  models  the  consequential  failure  of  flight  critical  equipment 
or  of  at  least  one  other  APU  following  turbine  break-up  of  one  APU 
at  normal  speed  (Event  PB) . Appendix  B6.5-34  presents  a fault  tree 
(labeled  CB2)  that  applies  to  the  second  (middle)  node  for  CB  in 
the  event  tree.  This  models  the  consequential  permanent  failure 
of  flight  critical  equipment  or  at  least  one  other  APU  following 
a turbine  runaway  failure  (Event  TB) . Appendix  B6.5-35  presents 
a fault  tree  (labeled  CB3 ) that  applies  to  the  third  (lowest) 
node  for  CB  in  the  event  tree  and  models  the  consequential  failure 
of  flight  critical  equipment  or  at  least  one  other  APU  following 
permanent  start  failure  of  one  APU  (Event  SS) . Separate  fault 
trees  are  required  for  the  various  nodes  because  the  potential  for 
consequential  failure  following  a turbine  runaway  is  higher  than 
for  permanent  failures  taken  into  account  in  the  PB  Event,  and  the 
probability  of  consequential  failure  following  start  failures  is 
assessed  to  be  negligibly  small.  The  numerical  results  computed 
from  the  CB  Fault  Trees  directly  yield  the  requisite  split 
fractions  for  Nodes  1 through  3 of  Top  Event  CB  in  the  event  tree. 

Top  Event  HB:  Failure  of  One  APU  Due  To  Exhaust  Gas  Leak 

or  GGVM  Detonation 

The  seventh  top  event  in  the  Stage  B Event  Tree  is  HB.  This 
event  represents  the  failure  of  at  least  one  APU  as  a consequence 
an  exhaust  gas  leak  in  at  least  one  APU,  or  as  a consequence  of 
external  fuel  leakage  produced  by  a detonation  resulting  from  fuel 
leaking  into  the  solenoid  cavity  of  either  GGVM  valve. 

The  model  for  the  first  part  is  exactly  the  same  as  that  developed 
for  Stage  A.  The  second  part  was  not  included  in  the  Stage  A 
analysis  because  it  was  judged  to  be  a very  low  likelihood  event 
during  that  part  of  the  mission  because  of  its  very  short  duration. 
It  would  take  time  for  the  fuel  to  leak  into  the  solenoid  cavity 
and  for  the  subsequent  fuel  decomposition  and  detonation  to  occur. 
For  Stage  B,  however,  it  has  a higher  likelihood  of  occurrence 
because  of  the  longer  duration-most  particularly  during  the  long 
ori“°rbit  period.  Because  of  the  knowledge  acquired  concerning  the 
very  low  likelihood  of  failure  as  a consequence  of  exhaust  gas 
leaks,  it  became  clear  that  Event  HB  is  dominated  by  the  solenoid 
detonation  event. 

There  are  two  classes  of  solenoid  detonation  events  that  can 
occur.  One  involves  the  GGVM;  the  other,  the  isolation  valves. 

In  the  case  of  the  GGVM,  the  consequential  external  fuel  leakage 
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is  smaller  (because  of  closed  isolation  valves)  and  is  much  more 
apt  to  result  in  failure  of  only  the  leaking  APU.  In  the  case 
of  the  isolation  valves,  the  consequential  external  fuel  leakage 
is  much  more  massive  (coming  directly  from  the  fuel  tank  because 
of  the  inability  to  isolate  the  leak)  and  is  expected  to  fail 
more  than  just  the  leaking  APU.  Based  on  these  considerations, 
it  seemed  reasonable  to  cover  the  GGVM  case  in  the  Event  HB , 
which  addresses  single  APU  failures,  and  to  include  the  isolation 
valve  case  in  Event  GB,  which  covers  multiple  APU  failures. 

The  fault  trees  developed  for  HB  are  presented  in  Appendices 
B6.5-36  through  B6.5-40.  Appendix  B6.5-36  presents  a Fault  Tree 
(labeled  HB1)  that  applies  to  the  first  (uppermost)  node  for  HB 
in  the  event  tree  and  models  the  permanent  failure  of  at  least 
one  out  of  three  APUs  as  the  primary  consequence  of  an  external 
fuel  leak  caused  by  detonation  in  the  GGVM  as  a result  of  fuel 
leakage  into  one  of  the  two  solenoid  cavities. 

Appendix  B6.5-37  presents  a fault  tree  (labeled  HB2)  that  applies 
to  the  second  node  for  HB  in  the  event  tree.  This  models,  in 
conjunction  with  another  permanent  failure  (from  Event  PB) , the 
permanent  failure  of  a second  APU  as  the  primary  consequence  of 
an  external  fuel  leak  caused  by  detonation  in  the  GGVM  because  of 
fuel  leakage  into  one  of  the  two  solenoid  cavities. 

Appendix  B6.5-38  presents  a fault  tree  (labeled  HB3 ) that  applies 
to  the  third  node  for  HB  in  the  event  tree  and  models,  in 
conjunction  with  a turbine  runaway  failure  (from  Event  TB) , the 
permanent  failure  of  a second  APU  as  the  primary  consequence  of 
an  external  fuel  leak  caused  by  detonation  in  the  GGVM  because  of 
fuel  leakage  into  one  of  the  two  solenoid  cavities. 

Appendix  B6.5-39  presents  a fault  tree  (labeled  HB4 ) that 
applies  to  the  fourth  node  for  HB  in  the  event  tree.  This 
models,  in  conjunction  with  a permanent  start  failure  of  one  APU 
(from  Event  SS)  , the  permanent  failure  of  a second  APU  as  the 
primary  consequence  of  an  external  fuel  leak  caused  by  detonation 
in  the  GGVM  because  of  fuel  leakage  into  one  of  the  two  solenoid 
cavities . 

Separate  fault  trees  are  required  for  the  various  nodes  to 
properly  account  for  the  order  correction  factors.  The  exhaust- 
gas-leak  portions  of  the  event  trees  are  exactly  the  same  as 
those  developed  for  Stage  A.  The  new  fuel-leak  portions  simply 
identify  the  ways  in  which  fuel  can  leak  into  the  solenoid 
cavities  and  account  for  the  resultant  potential  for  detonation. 
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The  numerical  results  computed  from  the  HB  Fault  Trees  directly 
yield  the  requisite  split  fractions  for  the  various  nodes  for 
the  HB  Event. 

Top  Event  GB:  Failure  of  Flight  Critical  Equipment  or  Second 

APU  Due  to  Exhaust  Gas  Leak  or  Valve  Detonation 

The  eighth  top  event  in  the  Stage  B Event  Tree  is  GB.  This  event 
represents  the  failure  of  at  least  two  APUs  as  a consequence  of 
an  exhaust  gas  leak  in  at  least  two  APUs,  or  as  a consequence  of 
massive  external  fuel  leakage  produced  by  a detonation  resulting 
from  fuel  leaking  into  the  solenoid  cavity  of  one  of  the  isolation 
valves  in  an  APU. 

The  model  for  the  first  part  is  exactly  the  same  as  that  developed 
for  Stage  A.  The  second  part  was  not  included  in  the  Staae  A 
analysis  because  it  was  judged  to  be  a very  low  likelihood  event 
during  that  part  of  the  mission  because  of  its  very  short  duration. 
It  would  take  time  for  the  fuel  to  leak  into  the  solenoid  cavity 
for  the  subsequent  fuel  decomposition  and  detonation  to  occur. 
For  Stage  B,  however,  it  has  a higher  likelihood  of  occurrence 
because  of  the  longer  duration  of  the  on-orbit  period.  Because  of 
the  knowledge  acquired  concerning  the  very  low  likelihood  of 
failure  as  a consequence  of  exhaust  gas  leaks,  it  became  clear 
that  Event  GB  is  dominated  by  the  solenoid  detonation  event.  ' 

As  discussed  for  the  HB  Event,  there  are  two  classes  of  solenoid 
detonation  events  that  can  occur.  One  involves  the  GGVM,  the 
other  the  isolation  valves.  In  the  case  of  the  GGVM,  the  con- 
sequential external  fuel  leakage  is  smaller  and  is  much  more  apt 
to  result  in  failure  of  only  the  leaking  APU.  In  the  case  of 
the  isolation  valves,  the  consequential  external  fuel  leakage 
is  much  more  massive  (coming  directly  from  the  fuel  tank)  and  is 
expected  to  fail  more  than  just  the  leaking  APU.  For  scenarios 
involving  both  the  HB  and  GB  Events,  those  two  events  can  most 
reasonably  be  treated  as  separate,  independent  events.  That  is, 
the  numerical  result  from  the  GB  quantification  directly  provides 
the  requisite  split  fraction  for  the  Top  Event  GB  in  the  event 
tree  (which  is  considered  acceptable  because  the  exhaust  gas  leak 
probabilities  are  so  very  small  with  respect  to  the  solenoid 
detonation  considerations) . 

The  fault  trees  developed  for  GB  are  presented  in  Appendices  B6.5— 
4*  through  B6.5-46.  Appendices  B6.5-41  through  B6.5-45  present 
five  different  fault  trees  to  support  scenarios  involving  no  other 
failures  to  the  left  of  it  in  the  event  tree  and  Events  HB,  PB,  TB, 
and  SS.  These  fault  trees  are  very  similar  to  those  developed  for 
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the  Event  HB.  The  numerical  results  computed  from  those  five  fault 
trees  provide  the  requisite  split  fractions  for  the  five  nodes  of 
Top  Event  GB  in  the  event  tree.  No  conditional  probabilities  are 
computed  as  was  done  in  the  Stage  A analysis.  The  Fault  Tree  GB-T 
presented  in  Appendix  B6.5-46  is  used  in  the  same  basic  manner  as 
described  above  for  Fault  Tree  DB-T  for  Event  DB. 

Top  Events  Ml/  M2 / M3:  Hydrazine  Leakage  from  APU  l,  2,  or  3 

The  ninth,  tenth,  and  eleventh  top  events  in  the  Stage  B Event 
Tree  presented  in  Figure  6.4.2  are  Mk,  where  k can  be  l,  2,  or  3. 
This  event  was  analyzed  in  exactly  the  same  manner  as  was  done 
for  Stage  A,  and  the  event  tree  used  for  representing  the 
requisite  split  fractions  is  shown  in  Appendix  B6.5-47.  To  use 
the  leakage  split  fractions  listed  in  that  appendix,  it  is  simply 
a matter  of  matching  those  nodes  in  that  figure  with  the 
corresponding  nodes  in  the  event  tree  in  Figure  6.4.2.  The  Split 
Fraction  P21  for  Node  1 of  the  Event  M2  is  matched  to  all  nodes 
in  the  event  tree  for  which  M2  occurs  when  Ml  does  not  occur. 
Likewise,  the  Split  Fraction  P22  for  Node  2 of  the  Event  M2  is 
matched  to  all  nodes  in  the  event  tree  for  which  Ml  does  occur. 

A similar  approach  is  used  for  the  nodes  for  M3. 

Top  Event  FB:  Failure  of  Flight  Critical  Equipment  Due  to 

Spatial  Interaction  Initiated  by  Hydrazine 
Leakage 

The  twelfth  top  event  in  the  Stage  B Event  Tree  is  FB.  This  event 
represents  the  permanent  failure  of  flight  critical  equipment  as  a 
direct  consequence  of  a fuel  leak  in  one  or  more  APUs.  No  fault 
tree  was  constructed  for  this  event  since  the  requisite  split 
fraction  is  simply  one  number  that  depends  only  on  the  specific 
leakage  conditions  for  the  scenario  being  analyzed.  The  develop- 
ment of  those  single  split  fractions  is  discussed  in  Section  7.0. 

Top  Events  01/  D2  / D3:  Hydrazine  Leakage  Causes  Failure  of 

APU  1/  2/  or  3 Given  That  Two  APUs  Have 
Hot  Failed 

The  thirteenth,  fourteenth,  and  fifteenth  top  events  in  the  Stage 
B Event  Tree  are  Dk,  where  k can  be  1,  2,  or  3.  This  event 
represents  the  consequential  failure  of  APU  k due  to  a fuel  leak 
in  one  of  the  APUs.  The  leak  can  be  in  APU  k,  in  some  other  APU, 
or  in  some  combination  of  both.  The  specific  condition  depends 
entirely  on  the  particular  event  tree  scenario  being  analyzed. 
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A generic  fault  tree  applicable  to  all  of  the  Dk  Event  Tree  nodes 
is  presented  in  Appendix  B6.5-48.  This  fault  tree  is  exactly  the 
same  as  that  developed  for  the  Stage  A analysis. 

Top  Events  Rl,  R2 , R3:  Leak  in  AFU  1,  2,  or  3 Before  EI-13  or 

Into  Pump  seal  Cavity 

The  sixteenth,  seventeenth,  and  eighteenth  top  events  in  the 
Stage  B Event  Tree  are  Rk,  where  k can  be  1,  2,  or  3.  This  event 
represents  the  shutdown  of  an  AFU  because  of  a small  fuel  leak. 
Included  in  this  category  are  all  pump  seal  leaks.  Also  included 
are  small  leaks  that  occur  before  EX  and  are  detected.  The 
generic  fault  tree  for  this  event  is  presented  in  Appendix  B6.5- 
49.  The  probability  that  a leak  occurs  before  El  given  that  a 
leak  has  occurred  is  taken  to  be  a time  ratio: 


TEI  ~ TTIG-5 
TSD  “ TTIG-5 


(SD  « SHUTDOWN) 


This  fraction  is  conservative  (small)  in  that  it  is  based  on 
the  time  TIG-5  rather  than  some  average  of  the  start  times 
from  TIG-5  to  EI-13.  The  value  of  this  fraction  is  25/66, 
or  0.38. 


Top  Events  T1 , T2 , T3:  Spurious  Shutdown  of  APU  1,  2,  or  3 

The  nineteenth,  twentieth,  and  twenty-first  top  events  in  the 
Stage  B Event  Tree  are  Tk,  where  k can  be  1,  2,  or  3.  This 
event  represents  an  APU  recoverable  failure  involving  a 
spurious  overspeed  or  underspeed  trip  of  the  turbine  in  APU  k. 
This  condition  causes  an  immediate,  automatic  shutdown  of  the 
affected  APU,  but  that  APU  can  normally  be  recovered  later 
during  Stage  B by  setting  the  associated  automatic  shutdown 
switch  to  the  inhibit  position.  This  particular  failure  mode 
has  been  separated  from  all  of  the  other  recoverable  failures 
because  of  the  immediate,  automatic  loss  of  the  affected  APU. 

The  generic  fault  tree  developed  for  Tk  is  shown  in  Appendix 
B6.5-50.  This  fault  tree  is  essentially  the  same  as  that 
developed  for  the  Stage  A analysis. 

Top  Events  TE,  PW:  Pailure  of  at  Least  One  APU  After  TAEM-3.5 

Minutes;  Pailure  of  at  Least  One  APU  After 
Wheel stop 
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The  twenty-second  and  twenty-third  top  events  in  the  event  tree 
are  TE  and  PW.  These  events  identify  failures  that  occur  during 
TAEM  (TE)  and  post  wheelstop  (PW) . 

The  split  fractions  for  the  TE  Event  simply  involve  time  ratios. 
The  specific  manner  in  which  the  ratio  is  used  depends  on  the 
specific  scenario  being  analyzed.  The  fundamental  probability  (a 
time  ratio)  is  defined  as  follows: 


TSP  ~ TTAEM-3 ♦ 5 
tSD  " TEI-13 


(SD  = SHUTDOWN) 


Based  on  this  formula,  the  following  expressions  can  be  used  to 
calculate  the  split  fractions  for  the  associated  scenario 
conditions: 


p 1 of  1 fails  after  TAEM 

2p.  - p 2 . . l of  2 fails  after  TAEM 

P1 2 of  2 fail  after  TAEM 

The  above  estimates  are  conservative  (high)  in  that  they  are  based 
on  EI-13  in  the  denominator  instead  of  some  average  value  between 
TIg-5  and  EI-13.  The  value  of  the  fundamental  probability*  is 
taken  to  be  20.5/54,  or  0.38. 


In  the  case  of  PW,  a simple  time  ratio  can  be  used  for  scenarios 
having  APUs  failing  for  causes  other  than  fuel  leaks,  while  a 
more  complex  formulation  is  needed  for  scenarios  involving  fuel 
leaks.  A simple  time  ratio  is  not  adequate  in  the  case  of  fuel 
leakage  because  of  the  time  delay  inherent  in  accumulating 
sufficient  hydrazine  in  the  aft  compartment  to  cause  damage 
given  the  onset  of  a leak.  For  cases  involving  a simple  time 
ratio,  the  following  fundamental  probabilities  are  defined: 


PT 


TSP  ~ TWS 
TSD  ~ TTAEM— 3.5 


for  failures  occurring  after  TAEM- 3 . 5 


pp  « . . . for  all  other  failures 

tSD  " tTIG-5 

Based  on  these  formulas,  the  following  expressions  are  used  to 
calculate  the  split  fractions  for  the  associated  scenario 
conditions : 
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p 


T 


1 of  1 event  that  occurs  after  TAEM-3.5 
also  occurs  after  vheelstop 


2 Pip 


1 of  2 events  that  occurs  after  TAEM-3.5 
also  occurs  after  wheelstop 


P 


T 


2 


2 of  2 events  that  occur  after  TAEM-3 . 5 
also  occur  after  vheelstop 


PF 1 of 

2PF  - PF2  . . 1 of 

Pp-^  .....  2 of 

The  value  of  PT  is  10/20 
or  0.15. 


1 event  occurs  after  vheelstop 

2 events  occurs  after  vheelstop 
2 events  occur  after  vheelstop 

5,  or  0.49.  The  value  of  PF  is  10/66, 


For  cases  requiring  the  more  complex  formulation  (that  is,  vhen 
fuel  leakage  is  involved) , their  bases  can  be  described  using  the 
diagram  presented  in  Appendix  B6.5-51.  The  horizontal  scale  is  a 
non-linear  time  scale.  The  vertical  scale  at  the  right  indicates 
the  total  amount  of  fuel  leaked,  vhile  the  scale  at  the  left 
indicates  the  total  amount  of  leaking  fuel  accumulated  in  the  aft 
compartment.  The  shaded  region  labeled  T in  the  center  represents 
uncertainty  in  the  threshold  amount  of  fuel  required  in  the 
aft  compartment  to  support  combustion.  Line  LI  indicates  a leak 
occurring  on-orbit.  In  orbit,  the  vent  doors  are  open,  so  leaking 
fuel  can  exit  the  aft  compartment.  It  is  not  until  after  the  vent 
doors  are  closed  for  deorbit  that  fuel  can  begin  to  accumulate  in 
the  aft  compartment,  as  indicated  by  line  Al.  Line  A1  intersects 
threshold  region  T at  some  point  betveen  TAEM  and  vheelstop, 
indicating  that  a fire  vould  be  expected  to  begin  before  vheel- 
stop. Line  A2  shovs  a leak  occurring  after  El.  Hydrazine  begins 
to  accumulate  in  the  aft  compartment  immediately.  That  line 
intersects  the  threshold  region  T after  vheelstop,  indicating  that 
a fire  is  expected  to  be  delayed  until  after  vheelstop. 

From  this  overviev  perspective,  the  PW  split  fraction  is  computed 
as  follovs: 

T0  + TTAEM-3 .5  “ TTIG— 5 

X ^wnw9 

T0  + TSD  " TTIG-5 

TSD  * TTAEM-3.5  TSD  “ TWS  + TBU 

x 

T0  + TSD  ~ TTIG-5  TSD  ~ TTAEM-3 . 5 
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The  coefficient  of  PWDWS  is  the  fraction  of  the  total  Stage  B time 
that  occurs  before  TAEM-3.5.  PWDWS  is  the  conditional  probability 
that  damage  from  a fuel  leak  occurs  after  wheelstop,  given  that  the 
leak  occurs  before  TAEM-3.5.  The  value  for  this  probability  was 
evaluated  from  the  distribution  presented  in  Appendix  B6.5-52.  This 
distribution  was  developed  from  a review  of  the  leakage  data  in  the 
database.  The  point  estimate  (mean)  from  this  distribution  is  0.7. 
In  the  second  term,  the  first  ratio  represents  the  fraction  of  the 
total  Stage  B time  that  occurs  after  TAEM-3.5.  The  second  ratio 
represents  the  fraction  of  the  post— TAEM— 3 • 5 time  that  a leak 
occurs  late  enough  to  permit  the  build— up  delay  of  the  fuel  in  the 
aft  compartment  to  delay  the  consequential  damage  until  some  time 
after  wheelstop.  This  build-up  time,  TBU,  was  assessed  to  be 
about  4 minutes,  based  on  an  evaluation  of  available  leakage 
information  in  the  database. 

Top  Event  RE:  Failure  to  Recover  APU  When  Needed  For  Landing 

The  last  top  event  in  the  event  tree  is  RE.  This  event  covers 
failure  to  recover  APUs  that  failed  during  Stage  B.  This 
includes  failure  to  restart  and  detonation  at  restart.  Run 
failures  are  covered  by  the  PB  and  DB  Events,  and  consequential 
failures  due  to  fuel  leaks  are  covered  by  the  Dk  Events. 

Although  there  is  one  basic  fault  tree,  there  are  three  variations 
of  it,  based  on  the  specific  scenario  being  analyzed.  These  three 
forms  are  presented  in  Appendices  B6.5— 53  through  B6.5-55;  one  for 
the  case  of  small  fuel  leaks  (REL) , one  for  the  case  of  a spurious 
shutdown  (RES) , and  one  for  scenarios  involving  both  a spurious 
shutdown  and  a small  fuel  leak  (RELS) . 


6. 5. 8. 2 Initial  Condition  4 (Impact  Vector  2) 

Based  on  the  discussion  in  Section  6.4,  initial  condition  4 (from 
damage  bin  4)  is  defined  as  follows: 

One  APU  permanently  failed  and  one 
APO  spurious  shutdown  during  stage  A 

The  split  fraction  models  discussed  below  support  the  event  tree 
in  Appendix  B6.4-3  that  was  developed  for  this  initial  condition. 
The  fault  tree  diagrams  refer  to  this  initial  condition  as  Impact 
Vector  2. 
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Top  Brant  D8:  second  AFU  Fails  to  start 

The  first  top  event  in  the  Stage  B Event  Tree  is  DS.  This  event 
represents  a specific  type  of  APU  permanent  failure  — namely, 
failure  of  a second  APU  because  of  failure  to  start  on  demand. 

The  fault  tree  developed  for  DS  is  presented  in  Appendix  B6.5-56. 
This  diagram  is  essentially  the  same  as  the  one  developed  for  the 
start  failures  in  the  Top  Event  SS  for  Stage  B,  initial  condition 
7 (impact  Vector  1)  . The  only  difference  from  that  model  is  that 
the  top  gate  is  based  on  l-out-of-2  logic,  rather  than  the  l-out- 
of-3  logic  used  for  Impact  Vector  1. 

The  numerical  result  computed  from  the  Fault  Tree  DS  directly 
yields  the  requisite  split  fraction  for  the  Top  Event  SS  in  the 
event  tree. 

Top  Event  TB:  Turbine  Overspeed 

The  second  top  event  in  the  Stage  B Event  Tree  is  TB.  This  event 
represents  a specific  type  of  APU  permanent  failure  — namely, 
one  involving  turbine  runaway,  where  failures  cause  the  turbine 
speed  to  increase  above  normal  operating  levels  and  the  overspeed 
protection  system  fails  to  shut  the  turbine  down. 

The  fault  tree  developed  for  TB  is  presented  in  Appendix  B6.5-57. 
This  model  is  essentially  the  same  as  that  developed  for  Event 
TB  for  Stage  B,  Impact  Vector  1.  The  only  difference  is  that  the 
top  gate  has  l-out-of-2  logic  instead  of  the  l-out-of-3  logic 
used  for  Impact  Vector  1.  The  numerical  result  computed  from 
Fault  Tree  TB  directly  yields  the  requisite  split  fraction  for 
the  Top  Event  TB  in  the  event  tree. 

Top  Event  DB:  Failure  of  the  Second  APU  After  It  Starts 

The  third  top  event  in  the  Stage  B Event  Tree  is  DB.  The  fault 
tree  developed  for  DB  is  presented  in  Appendix  B6.5-58.  This 
model  is  essentially  the  same  as  that  developed  for  Event  PB  for 
Stage  B,  Impact  Vector  1.  The  only  difference  is  that  the  top 
gate  has  l-out-of-2  logic  instead  of  the  l-out-of-3  logic  used 
for  Impact  Vector  1.  The  numerical  result  computed  from  Fault 
Tree  DB  directly  yields  the  requisite  split  fraction  for  the  Top 
Event  TB  in  the  event  tree. 
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Top  Event  SB:  Uninhibited  spurious  Shutdown  of  at  Least  one  APU 

The  fourth  top  event  in  the  Stage  B Event  Tree  is  SB.  This  event 
represents  a variation  on  a specific  type  of  APU  failure  ~ 
namely,  one  involving  a spurious  overspeed  or  underspeed  trip 
of  the  turbine  in  APU  k.  While  this  condition  causes  an 
immediate,  automatic  shutdown  of  the  affected  APU,  its  effects 
for  Impact  Vector  2 in  Stage  B are  quite  different  from  those 
for  Impact  Vector  1 in  Stage  B.  For  Impact  Vector  1,  the  auto- 
matic trip  circuitry  can  subsequently  be  manually  set  to  the 
inhibit  position  so  that  the  APU  can  be  started  at  a later  time. 

For  Impact  Vector  2,  however,  that  inhibit  selection  was  made 
before  any  of  the  APUs  were  started  for  the  entry  phase  of  the 
mission.  Thus,  if  a spurious  shutdown  occurs  anyway,  it  means 
that  the  inhibit  circuitry  was  not  functioning  properly  and  that 
the  APU  cannot  be  restarted.  This  instance  of  a spurious  shut- 
down represents  a permanent  failure. 

The  fault  tree  developed  for  SB  is  presented  in  Appendix  B6.5-59. 
This  fault  tree  is  similar  to  the  fault  tree  for  Tk  in  the 
analysis  for  Impact  Vector  1,  except  that  it  has  been  changed 
to  account  for  failure  of  the  inhibit  circuitry.  The  numerical 
result  computed  from  Fault  Tree  SB  directly  yields  the  requisite 
split  fraction  for  the  Top  Event  SB  in  the  event  tree. 

Top  Event  EB:  Failure  of  One  APU  Due  to  Exhaust  Gas  Leak  or 

GGVM  Detonation 

The  fifth  top  event  in  the  Stage  B Event  Tree  is  HB1.  This  model 
is  essentially  the  same  as  that  developed  for  Event  HB  for  Stage  B, 
Impact  Vector  1.  The  only  difference  is  that  all  failures  of  APU  3 
have  been  deleted  from  the  model.  The  numerical  result  computed 
from  Fault  Tree  HB1  directly  yields  the  requisite  split  fraction 
for  the  Top  Event  HB  in  the  event  tree. 

Top  Event  GB:  Failure  of  Flight  Critical  Equipment  or  a second 

APU  Due  to  Exhaust  Gas  Leak  or  Valve  Detonation 

The  sixth  top  event  in  the  Stage  B Event  Tree  is  GBO.  This  model 
is  essentially  the  same  as  that  developed  for  Event  GB  for  Stage 
B,  Impact  Vector  1.  The  only  difference  is  that  hot  gas  leak  of 
APU  3 (the  name  assigned  to  the  APU  that  failed  during  Stage  A) 
cannot  occur,  and  that  basic  event  has  been  deleted  from  the 
fault  tree.  However,  since  fuel  can  still  leak  into  the  solenoid 
cavities  in  the  isolation  valves  of  the  failed  APU,  that  basic 
event  has  been  retained.  The  numerical  result  computed  from 
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Fault  Tree  GBO  directly  yields  the  requisite  split  fraction  for 
the  Top  Event  GB  in  the  event  tree. 

Top  Events  Ml/  M2 , M3:  Hydrazine  Leakage  from  APU  1/  2,  or  3 

The  seventh,  eighth,  and  ninth  top  events  in  the  Stage  B Event 
Tree  are  MX,  where  X can  be  1,  2,  or  3.  This  event  was  analyzed 
in  exactly  the  same  manner  as  was  done  for  Stage  B,  Impact 
Vector  1,  and  the  event  tree  depicting  all  admissible  states  is 
presented  in  Appendix  B6.5-62. 

Top  Event  FB:  Failure  of  Flight  Critical  Equipment  Due  to  spatial 

Interactions  Initiated  by  Hydrazine  Leakage 

The  tenth  top  event  in  the  Stage  B Event  Tree  is  FB.  This  event 
represents  the  permanent  failure  of  flight  critical  equipment  as 
a direct  consequence  of  a fuel  leak  in  one  or  more  APUs.  No  fault 
tree  was  constructed  for  this  event  since  the  requisite  split 
fraction  is  simply  one  number  that  depends  only  on  the  specific 
leakage  conditions  for  the  scenario  being  analyzed.  The  develop- 
ment of  those  single  split  fractions  is  discussed  in  Section  7. 

Top  Event  PW:  Failure  of  at  Least  One  APU  After  wheelstop 

The  eleventh  top  event  in  the  event  tree  is  PW.  This  event 
is  analyzed  in  exactly  the  same  manner  as  discussed  in  the 
preceding  TE  and  PW  Events  for  Stage  B,  Impact  Vector  1 
(Initial  Condition  7) . 


6. 5. 8. 3 Initial  Conditions  5 and  6 (Impact  Vectors  3 and  4) 

As  discussed  in  Sections  6.4  and  8,  the  maximum  possible  collec- 
tive contribution  of  both  of  these  initial  conditions  to  LOC/V 
is  of  the  order  of  1 per  cent  or  less.  Neither  of  these 
conditions  can  possibly  make  a dominant  contribution  to  the  risk 
of  loss  of  crew  or  vehicle.  Since  little  significant  additional 
knowledge  or  insights  can  be  gained  by  analyzing  either  of  these 
two  initial  conditions,  there  is  no  need  to  develop  the  event 
trees  or  fault  trees  associated  with  either  of  these  initial 
conditions,  and  no  such  trees  are  presented. 


6.6  SPATIAL  INTERACTIVE  EVENTS  (SIEs) 

An  SIE  is  a propagating  failure  within  a system  or  a cascading 
failure  into  another  system  that  results  from  an  initiating 
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failure  or  condition  by  virtue  of  close  proximity.  To  be  an 
SIE,  a consequential  failure  must  also  be  initiated  by  means  of 
a physical  interactive  mechanism  such  as  hot  gas  or  shrapnel 
that  results  from  failure  of  or  degraded  operation  of  the  system. 
Thus,  a detonation  of  fuel  in  one  APU  Gas  Generator  Valve  Module 
(GGVM)  because  of  an  exhaust  leak  in  another  APU  is  a spatial 
interactive  event,  whereas  loss  of  an  APU  because  of  a secondary 
fuel  valve  failure  to  the  closed  position  in  the  GGVM  is  not. 


The  split  fraction  representing  an  SIE  is  modeled  as  a conditional 
probability  distribution  as  described  in  Section  5.4.  The  SIE 
split  fractions  discussed  in  this  analysis  are  a subset  of  the  set 
of  all  split  fractions  defined  by  the  node  points  on  the  APU  event 
trees . 

Three  types  of  SIEs  have  been  identified  as  significant  for  this 
study.  They  are  (1)  events  related  to  APU  turbine  breakup,  (2) 
events  related  to  APU  fuel  (hydrazine)  leakage,  and  (3)  events 
related  to  hot  exhaust  gas  leakage.  The  impact  of  an  SIE  depends 
in  some  cases  on  the  flight  phase  in  which  it  occurs.  In  these 
cases,  the  conditional  probability  distributions  modeling  the 
split  fraction  will  vary  from  one  phase  to  the  next.  The  three 
categories  of  SIEs  are  discussed  in  the  paragraphs  below. 


6.6.1  Events  Related  to  APU  Turbine  Breakup 

The  SIEs  resulting  from  APU  turbine  breakup  are  those  in  which 
turbine  fragments  directly  damage  other  APUs  or  flight  critical 
equipment,  or  in  which  fuel  leaking  from  the  damaged  APU  then 
damages  other  equipment.  Leaking  fuel  can  cause  contact 
corrosion,  flames  from  decomposition,  or  flames  from  combustion. 
SIEs  initiated  by  fuel  leakage  are  discussed  in  6.6.2.  SIEs 
initiated  by  turbine  fragments  are  reflected  in  the  conditional 
probability  distributions  defined  in  Section  7.6.1,  and  are 
discussed  below. 

Turbine  breakup  can  occur  while  the  APU  is  operating  in  its  normal 
design  speed  range  or  during  an  uncontrolled  overspeed.  A breakup 
at  normal  speed  would  result  from  installation  of  a seriously 
flawed  turbine  or  from  the  propagation  to  critical  size  of  a minute 
crack  that  was  not  detected  in  pre-installation  inspection.  Data 
to  develop  these  failure  frequencies  are  presented  in  Section  7.6. 

A breakup  at  overspeed  would  result  when  both  fuel  control  valves 
fail  to  close  on  command.  This  could  result  from  a failure  in  the 
valves  themselves  or  in  the  APU  controller.  Data  to  develop  these 
component  failure  frequencies  are  presented  in  Section  7.5. 
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The  effects  of  an  SIE  initiated  by  turbine  breakup  depend  on  the 
energy  level  of  the  uncontained  fragments,  the  likelihood  of  a 
fragment  striking  a piece  of  critical  equipment,  and  the  vulner- 
ability of  that  equipment  to  damage.  The  energy  level  of  the 
uncontained  fragments  is  determined  by  the  speed  at  which  the 
turbine  breaks  up  and  by  the  energy  absorbed  in  breaking  out  of 
the  APU  housing.  Uncontained  fragments  from  turbine  breakup  at 
normal  speeds  would,  therefore,  have  significantly  lower  energy 
levels  than  fragments  from  an  overspeed  breakup.  The  APU  housing 
design  is  a factor  as  well.  The  containment  ring  on  the  HPU  is 
26%  larger  than  that  on  the  APU.  For  this  reason,  the  energy 
levels  of  uncontained  fragments  from  the  APU  are  significantly 
higher  than  those  from  the  HPU. 

Determining  the  likelihood  of  a fragment  striking  a piece  of 
critical  equipment  is  a complex  analytical  task,  but  for  which 
Monte  Carlo-based  techniques  have  been  developed.  The  aft 
compartment  is  extremely  crowded  with  not  only  APU  fuel  lines  and 
tanks,  but  with  LH2  and  L02  feedlines,  avionics  bays,  hydraulic 
lines  and  numerous  wiring  harnesses.  The  probability  distribution 
to  describe  this  likeli-hood  was,  therefore,  based  on  all  avail- 
able knowledge  including  test  and  analytical  data  and  was 
developed  subjectively,  using  the  process  described  in  Sections 
5.8  and  7.6. 

The  vulnerability  of  equipment  to  damage  is  determined  by  the 
fragility  of  the  equipment  compared  to  the  possible  energy  levels 
of  the  fragments. 

Shrapnel  may  be  generated  by  means  other  than  turbine  breakup, 
such  as  APU  fuel  detonation  in  a fuel  line  or  from  a gearbox 
failure.  These  possibilities  were  also  evaluated  and  were  not 
considered  significant. 


6.6.2  Events  Related  to  APU  Fuel  Leakage 

The  SIEs  that  result  from  APU  fuel  leakage  are  those  in  which 
fuel  leaking  from  an  APU  damages  flight  critical  equipment  or  an 
APU.  SIEs  associated  with  APU  fuel  leakage  are  reflected  in  the 
conditional  probability  distributions  defined  in  Section  7.6.2. 
Values  assigned  to  the  split  fractions  are  also  discussed  in 
Section  7.6  below. 
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Another  potential  source  of  detonation  is  the  APU  fuel  pump  seal. 
If  the  carbon  face  of  this  seal  were  to  be  damaged  and  allow 
metal  to  rub  against  metal , then  high  temperatures  or  sparks 
would  be  produced,  causing  hydrazine  detonation  within  the  fuel 
pump.  This  event  occurred  once  during  testing  of  an  APU. 

The  two  scenarios  described  below  may  produce  conditions  leading 
to  hydrazine  detonation  upon  restarting  a leaking  APU. 

Scenario  1: 

Condition:  An  APU  fuel  leak  occurs  between  the  closed  fuel  isola- 
tion valve  and  the  fuel  pump;  the  hydrazine  in  the  connecting  fuel 
line  leaks  away  leaving  only  hydrazine  vapor  or  a vacuum. 

Result:  When  the  fuel  isolation  valve  is  opened  just  prior  to 

restarting  the  APU,  the  fuel  will  surge  along  the  line  and 
compress  the  hydrazine  vapor,  perhaps  causing  detonation.  Even 
if  no  vapor  remains  in  the  fuel  line,  the  action  of  the  hydrazine 
accelerating  into  the  line  past  the  fuel  isolation  valve  could 
introduce  vapor  bubbles  into  the  fuel  stream  and  the  water-  hammer 
effect,  which  occurs  when  the  fuel  reaches  the  fuel  pump,  could 
cause  detonation  (Reference  95) . 

Scenario  2: 

# 

Condition:  Hydrazine  leaks  into  the  solenoid  cavity  of  a fuel 

isolation  valve  or  gas  generator  control  valve. 

Result:  A failure  of  the  valve  by  means  of  (1)  detonation  of 

the  hydrazine  induced  by  the  catalytic  action  of  some  material 
contained  within  the  cavity,  such  as  nickel  plated  iron,  (2) 
electrical  shorting  of  the  coil,  or  (3)  detonation  caused  by 
a spark.  If  the  valve  does  not  fail  immediately,  then  later 
when  the  APU  is  restarted,  the  hydrazine  may  have  removed 
enough  electrical  insulation  to  cause  either  a spark  followed 
by  hydrazine  detonation  or  simply  electrically  short  the  coil 
(Reference  90) . 


6.6.3  Events  Related  to  Hot  APU  Exhaust  Gas  Leakage 

The  SIEs  that  result  from  hot  APU  exhaust  gas  leakage  are  those 
in  which  leaking  hot  APU  exhaust  gas  damages  flight  critical 
equipment  or  an  APU. 
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6. 6. 3.1  High  Pressure  Hot  APU  Exhaust  Gas  Leakage 

It  may  be  possible  for  high  pressure/high  temperature  gas  to  escape 
from  the  APU  gas  generator  to  the  general  environment  by  means  of  a 
narrow  channel  connecting  the  gas  generator  to  a high  pressure 
transducer.  Gas  within  the  gas  generator  has  a design  temperature 
of  1700‘F,  and  a pressure  of  1300  psia  (Reference  33).  The  gas  is 
expected  to  cool  somewhat  by  passage  through  the  access  channel. 

Because  of  the  leak  location,  only  the  leaking  APU  could  possibly 
be  damaged  by  high  pressure  hot  APU  exhaust  gas  leakage.  Hot  gas 
may  damage  the  APU  wiring  insulation.  This  insulation  is  Teflon 
wrapped  with  Kapton  tape,  and  may  be  destroyed  by  sustained 
exposure  to  temperatures  of  500 *F  or  above.  The  possibility  is 
considered  remote  and,  as  a simplifying  assumption  in  this  study, 
the  probability  was  considered  negligible. 


6. 6. 3. 2 Low  Pressure  Hot  APU  Exhaust  Gas  Leakage 

As  shown  in  Reference  84 , an  APU  gas  generator  leak  into  the  APU 
exhaust  duct  can  produce  exhaust  gas  temperatures  as  great  as 
1600 *F  without  starving  the  APU  turbine  due  to  the  loss  of  hot 
gas  flow.  The  APU  exhaust  duct,  constructed  of  Inconel  600,  is 
qualified  for  a temperature  of  1160*F  at  sea  level  and  1000*F  in 
space  (Reference  103) . 

Reference  98  shows  that  Inconel  600  suffers  a fairly  rapid  decay 
in  strength  at  temperatures  above  1200 *F.  It  may  be  possible 
that  exposure  of  the  APU  exhaust  duct  to  high  temperatures 
resulting  from  a gas  generator  leak,  in  combination  with  APU 
vibration  levels,  could  eventually  lead  to  exhaust  duct  failure. 
However,  there  is  some  confidence  that  the  duct  would  survive  at 
least  one  flight  in  this  condition  without  failure.  It  should  be 
noted  that  no  testing  has  been  done  to  verify  this  opinion. 

The  prime  contractor  currently  recommends  shutting  down  an  APU 
that  shows  a high  exhaust  temperature  indicative  of  a gas 
generator  leak.  However,  NASA-JSC  has  eliminated  exhaust  gas 
temperature  as  an  indicator  for  APU  shutdown  due  to  the 
unreliability  of  the  APU's  Exhaust  Gas  Temperature  (EGT)  trans- 
ducers and  to  the  availability  of  other  indicators  of  APU  health 
(Reference  26) . NASA-JSC  also  believes  that  the  possibility 
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of  exhaust  duct  failure  due  to  a gas  generator  leak  is  remote 
(Reference  43) . Again,  the  probability  was  considered  negligible 
as  a simplifying  assumption  in  this  study. 

Avionics  Bay  Damage 

The  APU  exhaust  plume  consists  of  a mixture  of  N2,  H2.  and  NH3  gas 
at  an  exhaust  duct  exit  temperature  of  between  900*F~and  1160 *F. 

At  the  exit  points  of  any  exhaust  leak  into  the  aft  compartment, 
the  temperature  will  be  no  greater  than  1160 *F. 

The  three  aft  avionics  bays  are  located  on  the  lower  part  of  the 
1307  bulkhead,  below  the  A.PUs.  The  configuration  of  the  APU 
exhaust  ducts  is  such  that  few  exhaust  leak  locations  would  result 
in  a leak  plume  being  directed  onto  the  exterior  of  one  of  the 
avionics  bays  at  close  range.  A leak  of  the  APU  1 exhaust  duct 
in  the  worst  possible  location  could  result  in  the  leak  plume 
impinging  on  the  upper  surface  of  Avionics  Bay  4 at  a distance 
of  about  6 feet.  For  other  exhaust  leak  locations,  the  direct 
line  distance  to  the  nearest  avionics  bay  is  13  feet  or  more. 

Reference  85  indicates  that  for  the  APU  exhaust  plume  at  sea 
level  conditions,  the  temperature  at  an  axial  distance  from  the 
nozzle  exit  of  6 feet  is  less  than  400 *F.  The  temperature  at  an 
axial  distance  of  13  feet  is  approximately  200 *F.  It  is  very 
unlikely  that  any  APU  exhaust  leak  would  direct  more  than  a small 
portion  of  the  total  exhaust  gas  flow,  resulting  in  even  lower 
temperatures  at  comparable  distances  from  an  exhaust  leak.  It 
should  also  be  noted  that  the  APU  exhaust  pressure  is  less  than 
10  Pounds  per  Square  Inch  (psi)  , meaning  that  an  exhaust  leak 
into  the  aft  compartment  is  a diffused  cloud  of  hot  gas  rather 
than  a directed  jet  of  hot  gas. 

Yet  another  mitigating  factor  is  that  the  avionics  equipment  in 
question  is  enclosed  within  an  aluminum  honeycomb  box  (the 
avionics  bay  itself)  covered  by  a one  inch  thick  insulation 
blanket,  and  is  cooled  by  freon  circulating  through  cold  plates. 

In  view  of  the  above  considerations,  the  chances  of  damage  to 
avionics  bay  electronics  equipment  due  to  direct  effects  of  an 
APU  exhaust  leak  are  considered  negligible  for  the  purposes  of 
this  study. 
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APO  Damage 


Damage  to  electrical  wiring  in  the  immediate  vicinity  of  the 
exhaust  duct  is  a credible  event,  particularly  wiring  for  the 
APU  itself.  The  wiring  insulation  is  Teflon  with  Kapton  tape 
wrapping,  which  is  destroyed  by  sustained  exposure  to  tempera- 
tures of  500 *F  or  above.  Temperatures  in  this  range  are  possible 
due  to  an  exhaust  leak,  but  are  still  unlikely  as  they  require  a 
substantial  portion  of  the  APU  exhaust  plume  to  be  diverted  into 
the  aft  compartment. 

It  appears  that  the  elbow  joints  of  the  exhaust  duct  or  the  APU 
to  exhaust  duct  seal  are  somewhat  more  susceptible  to  leaks. 

In  either  case,  the  wiring  susceptible  to  damage  from  the  leak 
is  the  wiring  of  the  leaking  APU.  Thus,  there  may  be  a small 
probability  of  loss  of  an  APU  with  an  exhaust  leak,  and  a much 
smaller  probability  of  loss  of  one  APU  due  to  another  APU's 
exhaust  leak.  The  latter  applies  to  APUs  1 and  2 only;  APU  3 
is  10  feet  away  from  the  nearest  other  APU  exhaust  duct.  The 
‘ probability  of  these  events  was  considered  negligibly  small  for 
this  study . 

Another  effect  to  consider  is  the  effect  of  the  leaking  APU  exhaust 
on  fluid  lines  of  another  shut  down  APU.  The  extreme  consequence 
of  this  could  be  detonation  of  fuel  in  the  lines.  Periods  of 
potential  exposure  of  a stagnant  fuel  line  in  one  APU  to  another 
APU's  leaking  exhaust  are  limited  to  about  5 minutes  during  Flight 
Control  System  (FCS)  checkout  and  about  20  minutes  during  entry. 
Rough  calculations  indicate  possible  detonation  during  the  entry 
timeframe  if  a high  pressure,  focused  jet  of  APU  1 exhaust 
impinges  on  APU  2 fuel  lines  before  APU  2 start,  or  vice-versa. 

As  concluded  earlier,  such  a high  pressure,  focused  jet  is  of 
negligible  probability. 

Orbiter  Aft  compartment  overpressurisation 

A possible  Orbiter  damage  scenario  is  overpressurization  of  the 
aft  compartment  due  to  the  accumulation  of  exhaust  gas  in  the 
compartment  during  the  period  of  entry,  before  the  vent  doors  open 
at  Mach  2.4.  The  vent  doors  are  closed  at  the  Software  Major  Mode 
(MM3 04)  transition  (EI-5  minutes),  so  exhaust  gas  accumulation  can 
begin  at  this  point.  The  exhaust  gas  pressure  is  approximately 
10  psi  in  space,  and  only  0.3  psid  pressure  is  required  to  cause 
structural  failure  to  the  aft  compartment. 
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Calculations  show  that  a leak  rate  of  «io%  of  the  total  exhaust 
gas  flow,  starting  at  MM  304  [Entry  Interface  (El) ] , is  required 
to  cause  damage  to  the  aft  compartment  structure  before  the  vent 
doors  open.  Such  a leak  rate  would  require  a large  hole  in  the 
duct  and  a mechanism  for  diverting  the  flow  out  of  the  hole. 

This  event  is  also  considered  to  be  of  negligible  probability. 

APU  Exhaust  Gas  Ignition  or  Explosion 

Yet  another  possible  source  of  severe  damage  to  the  vehicle  is 
ignition  of  hydrogen  accumulated  in  the  aft  compartment  due  to 
an  APU  exhaust  leak.  This  is  not  a concern  during  ascent  or 
orbit , because  there  is  insufficient  oxygen  to  support 
combustion.  It  could  be  a concern  during  entry  below  60,000 
feet,  where  sufficient  oxygen  exists  to  support  combustion. 
Because  of  the  extremely  low  likelihood  of  significant  gas 
leakage  into  aft  compartment,  and  the  considerations  mentioned 
above,  this  failure  event  is  considered  to  be  of  negligible 
probability. 
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7.0  APU  DATA  DEVELOPMENT 

This  section  describes  in  detail  the  process  used  to  collect  and 
evaluate  the  failure  data  for  the  APU,  and  also  the  process  used 
to  develop  probability  distributions  for  component  failure  rates 
from  this  data. 

A few  comments  concerning  probability  distribution  are  appropriate 
at  this  point.  Probability  distributions  are  used  in  this  context 
to  reflect  the  fact  that  component  failure  rates  are  uncertain. 

The  use  of  probability  distributions  provides  a complete  descrip- 
tion of  our  state  of  knowledge  about  the  failure  rates  of  the 
equipment  in  question,  including  any  sources  of  variability  among 
similar  components.  By  contrast,  use  of  a single  number,  called  a 
point  estimate,  would  tend  to  imply  a degree  of  exactness  that  is 
not  justified  by  the  data. 

It  is  important  to  bear  in  mind  that  the  existence  of  uncertainty 
about  component  failure  rates  does  not  imply  that  the  results  are 
inaccurate  or  that  they  reflect  a state  of  ignorance  on  the  part  of 
the  analysts.  Rather,  uncertainty  arises  from  a number  of  sources: 


a.  The  relatively  small  amount  of  data  that  is  available  on 
many  components 

b.  The  possibility  of  missing  data  (e.g.,  failures  that  are 
not  captured  by  the  data  collection  process) 

c.  Decisions  about  whether  incipient  failures  should  be 
treated  as  failures  in  the  data  analysis 

d.  Estimation  of  the  applicable  exposure  data  (e.g.,  the 
total  number  of  hours  that  a component  operated) 

e.  The  application  of  data  from  one  situation  (e.g., 
checkout)  to  other  situations  such  as  actual  flights 

f.  The  assumption  that  failure  rates  are  constant  over 
time 

g.  Differences  in  component  reliability  from  one  mission 
to  another  (e.g.,  due  to  differences  in  refurbishment) 


h.  Differences  in  component  reliability  from  one  APU  to 

another,  or  between  similar  components  in  the  same  APU 
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i.  The  extrapolation  of  failure  rate  estimates  developed 
for  other  applications  (e.g.,  aircraft)  to  the  Space 
Shuttle 

j . The  environmental  factors  that  should  be  used  in 
adjusting  failure  rate  estimates  from  one  application 
to  another 

The  approach  used  in  this  study  to  describe  and  quantify  such 
uncertainties  is  the  Bayesian  theory  of  probability.  In  this 
approach,  each  basic  event  frequency  is  described  by  a prob- 
ability distribution  specifying  the  various  possible  values  for 
that  frequency  and  the  likelihood  of  each.  The  Bayesian  approach 
is  capable  of  taking  into  account  both  engineering  judgment  about 
the  event  frequency,  and  also  empirical  data  such  as  the  actual 
number  of  failures  that  were  observed  during  operation  of  the  APU. 

In  particular,  a prior  probability  distribution  is  specified  to 
reflect  all  the  available  information  on  similar  components  in 
other  applications,  as  tempered  by  engineering  judgment.  This 
distribution  is  generally  then  updated  with  the  observed  APU  data 
to  yield  a revised  (i.e.,  posterior)  distribution.  In  other 
cases,  the  posterior,  distribution  is  simply  set  equal  to  the 
prior  distribution,  and  no  update  is  performed.  This  is  done  in 
cases  where  little  relevant  information  is  available  from  other 
sources.  The  available  APU  data  is  therefore  used  to  develop  the 
prior  distribution  instead  of  to  update  it.  In  addition,  no 
update  is  performed  in  cases  where  no  APU  data  is  available  for 
use  in  the  update;  e.g.,  in  modeling  certain  types  of  emergency 
demands  that  have  not  occurred  during  the  operating  experience  of 
the  APUs  to  date. 

The  use  of  judgment  is  in  keeping  with  the  Bayesian  theory  of 
probability,  and  the  judgment  of  an  analysis  group  that  is  know- 
ledgeable about  equipment  reliability  is  a valid  form  of  evidence 
for  use  in  formulating  distributions.  Experience  has  shown  that 
the  judgment  of  experienced  analysts  is  often  remarkably  close  to 
ac^ual  data  when  the  two  have  been  compared.  For  example,  several 
studies  of  component  reliability  have  found  expert  estimates  of 
component  failure  rates  to  be  typically  within  a factor  of  two  to 
four  from  the  observed  failure  rates. 

Section  7 . 1 describes  the  raw  data  sources  from  which  APU  failure 
data  was  obtained.  These  sources  include  such  documents  as 
corrective  action  reports,  anomaly  reports,  and  so  on.  For  most 
spatial  interaction  events  (SIEs) , virtually  no  empirical  data  was 
available.  Therefore,  judgmental  distributions  were  developed  for 
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the  frequencies  of  these  events  (e.g.,  the  likelihood  of  damaging 
an  adjacent  APU  as  the  result  of  a turbine  overspeed)  . The  process 
used  for  developing  SIE  distributions  is  described  in  Section  5.8 
and  the  resulting  judgmental  distributions  are  described  in 
Section  7.6.  These  distributions  were  based  on  extensive  know- 
ledge of  such  events,  and  also  on  a number  of  analytical  studies 
performed  specifically  in  support  of  this  PRA. 

Section  7.3  presents  tables  summarizing  the  raw  data  that  was 
discussed  in  Section  7.1.  These  tables  served  as  the  basis  for 
the  data  analysis.  However,  several  adjustments  were  made  to 
the  information  in  these  tables.  The  guidelines  and  criteria 
that  were  used  to  categorize  the  data  according  to  component 
type  and  failure  mode,  and  also  the  criteria  used  for  determining 
which  events  (e.g.,  incipient  failures)  that  would  be  considered 
non-failures  in  this  study  are  described  in  Section  7.4. 

In  general,  the  criteria  specified  in  Section  7.4  are  fairly 
conservative.  Conservative  in  this  study  means  erring  on  the 
side  of  overestimating  the  frequency  of  events.  For  example, 
grouping  several  similar  components  into  a single  category  for 
purposes  of  data  analysis  can  result  in  narrower  uncertainty 
bounds,  by  increasing  the  amount  of  data  available  for  use  in 
estimating  failure  rates.  Therefore,  in  this  study,  such 
grouping  was  generally  done  only  when  the  components  in  question 
were  virtually  identical  (e.g.,  for  identical  components  on 
different  APUs,  or  for  the  two  isolation  valves  on  each  APU)  . 

The  reason  for  this  approach  was  to  avoid  inadvertently 
attributing  inapplicable  data  to  particular  components,  since 
otherwise  an  inappropriate  failure  rate  distribution  could 
result.  To  give  some  indication  of  the  kinds  of  problems  that 
can  result  from  inappropriate  grouping  of  components,  consider 
an  example  involving  two  distinct  types  of  components.  If  one 
type  of  component  fails  once  every  100  hours  and  the  second 
type  fails  three  times  every  100  hours,  then  grouping  the  data 
for  these  two  components  would  give  an  average  failure  rate  of 
twice  every  100  hours,  which  is  not  appropriate  for  either  type 
of  component.  Treating  the  two  types  of  components  separately, 
as  was  done  in  this  study,  gives  slightly  broader  but  more 
accurate  distributions  for  the  component  failure  rates. 

Section  7.5  presents  the  actual  prior  and  posterior  distributions 
that  were  developed  for  the  categories  of  component  failures 
specified  in  Section  7.4.  The  sources  of  data  used  to  update 
the  distributions  for  the  various  failure  rates  are  indicated. 

The  Bayesian  analysis  that  was  used  to  develop  the  posterior 
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a. 

TEST 

- Development 

, Qual , Acceptance 

» 

b. 

SIMILARITY 

- Parts  nearly  alike;  built  to  same  speci- 
fications, same  usage,  same  manufacturer 

c. 

ANALYSIS 

- Analytical 
systems  not 
such  things 

evaluation;  e.g.,  large  parts/ 
lending  themselves  to  tests  of 
as  expecced  launch  vibrations 

d. 

OBSERVATION 

- Actual  use  of  components  in  flight  or  in 
similar  space  environment 

For  example,  test  and  checkout  failure  data  was  compiled  as 
determined  from  prelaunch  FRFs,  Hot  Fires,  and  Confidence  Runs 
which  may  be  compared  to  acceptance  tests.  Similarly,  electrical 
component  failure  rates  and  mechanical  device  failure  rate  data 
were  obtained  from  established  historical  sources.  References  97 
and  99,  respectively,  which  parallel  certification  by  similarity. 
Along  the  same  lines,  analysis  was  employed  during  the  develop- 
ment of  the  probability  distributions  for  the  SIEs.  The  flight 
time  accumulated  for  many  of  the  APU  components,  without  failure, 
is  in  keeping  with  certification  by  observation. 

Three  broad  types  of  data  were  required:  (1)  exposure  data  indi- 

cating how  long  the  various  APU  components  had  operated;  (2)  data 
indicating  how  many  failures  of  each  given  component  had  occurred 
over  the  corresponding  exposure  period;  and  (3)  the  failure  modes 
that  were  observed. 

It  was  obvious  that  utilizing  Qualification  Test  (Qual)  data  would 
not  produce  reasonable  failure  rates.  The  failures  associated 
with  the  Qual  test  program  phase  would  likely  represent  flaws  in 
the  early  design  or  manufacturing  process.  These  failures  would 
not  necessarily  be  indicative  of  the  final  flight  or  production 
components  or  of  later  refinements  in  the  manufacturing  process. 

An  exception  to  the  use  of  Qual  test  data  was  made  in  the 
development  of  the  Spatial  Interactive  Events  (SIE) . There  was 
no  other  available  source  of  information  from  which  to  draw 
conclusions  about  unlikely,  but  catastrophic,  occurrences  such 
as  an  APU  turbine  wheel  failing  in  overspeed. 

The  Acceptance  Test  (ATP)  phase  was  the  next  level  of  component 
development  for  which  data  was  known  to  be  available.  This  data 
was  considered  to  be  of  value  in  tracking  failures  from  the  time 
of  contractor  component,  or  system  delivery,  to  end-of-life. 
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However,  several  difficulties  were  encountered  that  made  it 
necessary  to  exclude  the  ATP  data  entirely.  They  were:  (1)  the 

lack  of  information  on  actual  design  changes  resulting  from  ATP 
failures,  (2)  the  inability  to  screen  out  facility  failures  and 
anomalies  caused  by  facility  or  test  setups,  (3)  the  lack  of  time 
and  funding  available  in  this  study  to  assure  that  the  failures 
observed  and  documented  in  the  ATP  data  were  representative  of 
the  flight  configuration. 

Launch  checkout  and  flight  data  were  selected  as  the  most  meaning- 
ful data  to  support  this  analysis.  This  data  represents  the  APU 
system  in  the  flight  configuration  and  flight  environment. 
Moreover,  it  was  judged  that  any  valid  failure  modes  identified  in 
Qual  or  Acceptance  tests,  and  not  corrected,  would  be  reflected  in 
flight  failure  rates,  thus  reducing  the  effect  of  not  including 
data  from  these  development  categories. 

Several  sources  of  launch  checkout  and  flight  data  were  found 
to  be  available  and  accessible  during  the  study  time  frame  and 
are  described  below.  With  the  exception  of  one  source,  this 
data  existed  in  paper  form  only.  The  exception,  the  APU 
Subsystem  Manager's  database,  was  a computer  resident  file 
with  no  hard  copy  printout  available. 

These  sources  were  utilized  to  develop  failure  histories,  and 
flight  histories  dating  from  1 January  1981  through  flight  #24, 
which  landed  on  18  January  1986.  Other  sources  such  as  NASA/ 
contractor  test  reports  and  discussions  with  knowledgeable 
personnel  were  used  primarily  as  an  information  base  to  assist 
in  the  development  of  probability  distributions  for  the  Spatial 
Interactive  Events. 

The  information  from  all  sources  was  analyzed  using  a specific 
set  of  criteria  necessary  to  track  APU  failures.  The  data  was 
assembled  into  computer  files  according  to  the  criteria 
established.  For  example,  it  was  necessary  to  track  APU  serial 
numbers,  dash  numbers,  flight  numbers , and  flight  dates  to 
prevent  duplicating  failure  and  anomaly  entries. 

The  salient  information  needed  to  develop  flight  failure  rates 
and  mission  timing  sequences  was  compiled  as  a basis  for 
developing  model  input  data.  The  individual  data  sources  and 
their  use  in  this  study  are  provided  and  discussed  below. 

a.  Johnson  Spacecraft  Center  (JSC)  Orbiter  Full  Problem  Report 
(FPR)  (Reference  28) 
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b.  Shuttle  Flight  Data  and  In-flight  Anomaly  List  (References 
31  and  32) 

c.  JSC  Mission  Reports,  Missions  1 thru  23  (References  1 
through  24) 

d.  The  National  Aeronautics  and  Space  Administration  (NASA)  APU 
Subsystem  Manager ' s Database 

e.  Study  and  test  reports  from  NASA  and  contractor  sources  and 
published  technical  documents 

The  FPR,  the  Shuttle  Flight  Data  and  In-flight  Anomaly  List,  the 
JSC  Mission  Reports  and  the  study  and  test  reports  existed  only  in 
paper  form*  The  remaining  source,  the  Subsystem  Manager's  data- 
base, was  resident  in  a Model  870  VAX  computer  located  in  Building 
13  at  JSC.  These  sources  support  various  NASA  functions,  and  as  a 
result,  differed  as  to  format  and  data  content.  The  salient 
information  from  each  of  the  data  sources  was  added  to  a personal 
computer  (PC)  data  base  program  that  provided  edit,  sort,  search, 
and  print  capability.  Conflicts  found  throughout  the  review  and 
data  compiling  process  required  resolution  before  a coherent  set 
of  data  could  be  developed.  The  individual  sources  are  discussed 
in  the  following  paragraphs. 


7.1.1  JSC  Orbiter  Full  Problem  Report 

The  FPR  was  utilized  as  a prime  source  of  APU  failure  data.  Each 
record  in  this  document  contained  various  types  of  information 
such  as  Corrective  Action  Report  (CAR)  numbers , APU  serial  numbers, 
dates  of  failures,  part  numbers,  and  problem  descriptions,  which 
were  invaluable  in  tracking  and  comparing  failures  from  different 
sources.  Also  included  were  failure  modes  as  well  as  analysis  and 
resolution  comments.  The  following  data  fields  were  utilized  to 
develop  a computer  data  file  needed  to  compile  the  relevant  data 
for  the  study: 

DATA  FIELD  DESCRIPTION 


a. 

Page  Reference 

FPR  page  number 

b. 

Test  Op 

Test  operation  (FLT,  CKO,  ATP,  QUAL) 

c. 

Failure  Mode 

Hardware  failure 

d. 

Report  Number 

CAR  number  assigned  to  failure 

e. 

Part  Name 

Hardware  Name 

f . 

Part  Number 

Hardware  Number 

g- 

Serial  Number 

APU  serial  number 
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h.  Date  Detected 

i.  Problem  Description 

j . Analysis  & 
Resolution 


Date  anomaly  first  reported 
Anomaly  description 
Recommendations  made  to  correct 
condition  causing  failure 


One  major  data  element  was  the  report  or  CAR  number.  This 
"common"  number  allowed  failure  correlations  between  JSC  Mission 
Reports  and  the  Shuttle  Flight  Data  and  In-flight  Anomaly  List. 

Based  on  the  ground  rules  established  in  paragraph  7.1,  only 
Flight  (FLT)  and  Checkout  (CKO)  records  were  selected  for  use 
in  this  study. 

Appropriate  information  from  the  FPR  records  was  entered  into  the 
PC  database  and  sorted  into  FLT  and  CKO  files.  The  FPR  data  could 
then  then  be  compared  with  data  from  other  sources  to  assure  that 
failures  were  logged  only  once.  By  careful  review,  the  first  four 
data  sources  discussed  in  Paragraph  7.1  were  combined  into  one  data 
file  to  produce  the  Raw  Data  Tables  as  described  in  Section  7.3. 


7.1.2  shuttle  Flight  Data. and  In-Flight  Anomaly  List 

The  Shuttle  Flight  Data  and  In-flight  Anomaly  List  is  a historical 
report  of  flight  related  information.  It  also  includes  in-flight 
anomalies  and  references  to  problems  encountered  during  the  STS 
missions. 

The  report  is  divided  into  two  sections.  The  first  section, 
entitled  "Shuttle  Flight  Data,"  provided  the  following  mission- 
related  information: 

a.  APU  serial  and  dash  number 

b.  APU  position 

c.  Launch  and  launch  scrub  dates 

d.  Initial  altitude  and  inclination 

e.  Flight  sequence  number,  flight,  and  Orbiter  number 

f.  Solid  Rocket  Booster  (SRB)  Separation  time 

g.  Thrust  bucket  throttle  times 

h.  Main  Engine  Cut-off  (MECO)  time 

i.  other  flight  related  data 

The  second  section,  entitled  "Shuttle  In-flight  Anomaly  List", 
provided  a list  of  significant  anomalies  that  occurred  on  STS 
missions.  The  anomalies  of  interest  for  this  study  were  for  the 
Auxiliary  Power  Unit  under  a Work  Unit  Code  (WUC)  designator  V46. 
The  type  of  information  gathered  was  a brief  description  of  the 
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anomaly,  the  STS  flight  problem  number  and/or  the  CAR  number 
associated  with  the  anomaly. 

AFU  failure  data  from  these  two  sections  were  combined  to  make  up 
an  "APU  Flight”  and  "APU  In-flight  Anomaly”  data  file  similar  to 
that  used  to  compile  the  FPR  data.  This  data  file  was  used  as  one 
of  the  four  data  sources  from  which  comparisons  were  made  prior  to 
the  development  of  the  Final  Data  Tables.  The  flight  related 
portions  of  the  data  base,  such  as  SRB  separation  time  and  thrust 
bucket  throttle  time,  were  us^a  to  develop  a "Study”  mission  data- 
base, combining  mission  sequence,  flight,  and  Orbiter  tail  numbers 
from  the  JSC  Mission  Reports. 


7.1.3  The  NASA  Subsystem  Manager* s Database 

This  data  source  consists  of  three  separate  historical  data  files 
containing  information  from  Flights  1 thru  24,  and  is  maintained 
by  the  NASA  APU  Subsystem  Manager  (SSM) . The  three  files  are 
entitled:  (1)  Flite  2,  (2)  Operational  History,  and  (3)  Hardware. 

Only  data  files  1 and  2 were  used  for  the  APU  study. 

The  Flite  2 data  file  is  essentially  a compilation  of  APU 
anomalies  tracked  by  the  SSM.  The  data  fields  included  were: 

a.  APU  position  and  serial  number 

b.  Mission  ID  (STS  Reference) 

c.  Anomaly  date 

d.  Component  name 

e.  Vehicle  number 

f.  Anomaly  phase  and  description 

g.  Action  taken 

The  Flite  2 data  file  assisted  in  determining  which  specific  APU 
component  (e.g.,  fuel  pump,  gas  generator,  relief  valve)  had 
failed  when  the  FPR  data  only  listed  the  anomaly  or  failure  as  an 
APU  without  regard  to  a part  number. 

The  Operational  History  data  file  provided  APU  run  times  by  APU 
position  and  serial  number.  The  data  fields  included  were: 


a.  APU  Position  and  Serial  Number 

b.  Run  Time  Event  (FRF,  STS  Flight,  Launch  scrub,  etc.) 

c.  Flight  Date  (Date  of  Run  Time  Event) 

d.  Run  Time  in  Decimal  Minutes 

e.  Pre/Post  FLT  Test  (FLT  or  Test  Time) 
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This  file  provided  the  APU  run  tines  associated  with  Flight 
Readiness  Firings  (FRF) , Launch  Scrubs,  Confidence  Runs  (CR)  and 
Checkout  Operations  at  Kennedy  Space  Center  (KSC) , and  Checkout 
at  the  Sundstrand  facility. 

These  run  times  were  divided  into  three  categories:  (1)  Flight 

(FLT)  , (2)  Checkout  (CKO)  , and  (3)  Hot  Applicable  (N/A)  . The 
flight  run  time  corresponded  to  APU  operation  during  past  missions. 
The  checkout  time  corresponded  to  APU  CRs,  Hot  fires,  FRFs,  and  APU 
operation  during  launch  scrubs.  All  Sundstrand  run  times  were 
classified  "N/A",  as  there  was  no  means  to  determine  the  test 
configuration.  A Sundstrand  test  configuration,  for  example, 
might  not  include  a flight-qualified  controller  or  the  APU  flight- 
type  tank,  lines  and  isolation  valves.  The  test  setups  most 
likely  would  not  have  included  all  of  the  flight  instrumentation. 

In  other  words,  an  accumulation  of  APU  system  component  operating 
times  could  not  reliably  be  determined.  Failure  rates  of 
individual  components  comprising  the  APU  were  considered  outside 
the  scope  of  this  study. 

The  data  was  sorted  to  provide  run  times  in  the  different 
categories,  and  was  compared  with  the  APU  flight  run  times  as 
obtained  from  the  JSC  Mission  Reports  (See  7.1.4).  The  CKO  run 
times  were  accumulated  separately,  to  be  used  in  conjunction 
with  checkout  failures  as  obtained  from  the  FPR  data  in 
determining  checkout  failure  rates. 


7.1.4  JSC  Mission  Reports 

The  JSC  Mission  Reports  were  used  to  obtain  mission  related  data. 
These  reports  were  also  used  as  references  when  mission  information 
obtained  from  other  data  sources  required  further  clarification. 

The  mission  reports  proved  to  be  very  valuable  during  the  course 
of  this  study.  They  were  utilized  to  determine: 


a.  Lift-off  (L/0)  time 

b.  APU  run  times  for  ascent,  orbit,  and  entry 

c.  Time  of  entry  interface  (El) 

d.  Blackout  end 

e.  Terminal  area  energy  management  (TAEM) 

f.  Touchdown  (TD) 


7-10 


g.  Wheelstop  (WS) 

h.  APU  deactivation  time 


It  was  necessary  to  accumulate  APU  run  times  by  flight  phases  to 
determine  the  variation  in  rur  time  during  these  periods.  Since 
the  SSM's  Operational  History  file  did  not  separate  these  times 
into  the  needed  phases  (ascent,  orbit,  and  entry) , the  JSC  Mission 
Reports  were  used  as  the  baseline  source  of  APU  on/off/duration 
run  times  for  the  study.  The  total  APU  run  times  from  the  SSM's 
database  were  compared  to  those  obtained  from  the  JSC  Mission 
Reports,  and  less  than  a 1%  difference  was  noted.  Therefore,  it 
was  concluded  that  the  APU  run  times  extracted  from  the  mission 
reports  would  be  adeguate  for  use  in  the  development  of  the 
mission-related  database  as  shown  in  Appendix  B7 . 3 . 


7.1.5  study  Reports.  Test  Results.  & Personal  Commyni cations 

Some  of  the  failure  modes  under  consideration  during  this  study 
have  a very  low  likelihood  of  occurrence.  Directly  applicable 
test  data  does  not  exist  for  some  failure  modes?  e.g.,  some 
catastrophic  SIEs.  In  order  to  estimate  these  likelihoods, 
information  from  a large  number  of  study  and  test  reports  from 
NASA  and  contractor  sources  and  other  technical  publications 
was  utilized. 

Valuable  information  used  to  supplement  the  written  reports  was 
obtained  through  telecons  with  various  knowledgeable  people  in 
specialized  fields  at  JSC  and  other  locations.  It  was  discovered 
during  the  study  that  tests  were  in  progress  at  White  Sands 
Proving  Grounds  on  the  properties  of  Hydrazine  and  its  effect  on 
certain  materials.  The  results  of  these  tests  may  have  had  an 
influence  on  some  of  the  hydrazine  use  during  this  study. 

However,  the  results  were  not  available  for  consideration  and 
application  for  this  study. 


7.2  SPATIAL  INTERACTIVE  EVENT  DATA 

This  section  presents  the  APU  SIE  split  fraction  distributions 
in  the  format  used  for  entry  into  the  PRA  model.  Table  7.2-1 
presents  the  data  relevant  to  ascent  and  Table  7.2-2  presents 
the  data  relevant  to  entry.  These  distributions  and  the  infor- 
mation supporting  their  development  are  presented  individually 
in  Section  7.6  and  are  presented  here  for  clarity  and  convenience. 
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8PATIAL  INTERACTIVE  EVENT  APU  ASCENT  DISTRIBUTIONS 
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FA1F17  Failure  of  2 APUs  or  FCE  Given  8.0800E-03  1.9280E-03  6.2260E-06  1.9587E-02 

Small  Leaks  in  at  Least  2 APUs 


BPATIAL  INTERACTIVE  EVENT  APU  ENTRY  DISTRIBUTIONS 
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7.3  RAW  DATA  TABLE  DEVELOPMENT 


This  section  summarizes  the  vast  amount  of  data  collected  to 
support  two  basic  needs  of  the  study:  (1)  determination  of 

observed  APU  failure  frequencies,  and  (2)  establishment  of  a 
typical  mission  time  reference  from  which  probabilities  of 
occurrence  could  be  calculated.  One  example  of  the  latter  was 
the  need  to  determine  the  variation  in  Entry  Interface  (El) 
times  to  obtain  the  average  time  for  the  start  of  two  APUs 
during  entry. 

There  were  three  sources  of  failure  frequencies:  (1)  actual 

flight  experience,  (2)  failure  rates  based  on  similarity  data 
from  accepted  sources,  and  (3)  failure  rates  derived  from 
engineering  judgment,  supported  by  limited  historical  data. 

A commercial  software  database  program  was  utilized  to  compile, 
manipulate  and  format  the  data  for  sorting  and  printing.  Two 
separate  databases  were  developed  from  the  sources  discussed 
in  the  previous  paragraphs:  a failure  history  database  and  a 
mission  event  database.  These  databases  are  discussed  in 
paragraphs  7.3.1  and  7.3.2.  The  failure  rates  of  electrical 
items  are  described  in  paragraph  7.3.3.. 


7.3.1  Failure  History  Database  and  Output 

The  Failure  History  database  was  developed  to  compile  flight 
failures  and  checkout  failures  from  the  sources  identified 
previously.  This  database  consists  of  (1)  the  APU  Flight 
Failure  data  file,  and  (2)  the  APU  Checkout  Failure  data  file. 

The  Flight  Failure  Data  Tables  (Appendix  B7.3-1)  represent 
records  created  from  the  APU  Flight  Failure  Data  file.  The 
data  fields  are  categorized  as  (1)  the  mission  sequence  number 
and  mission  ID  number  which  define  the  mission  on  which  the 
failure  was  cited;  (2)  the  APU  position,  serial  number,  and 
dash  number;  (3)  the  component  part  number  and  name  of  the 
failed  hardware;  (4)  the  failure  mode  and  problem  description 
which  provide  descriptive  inf ormation ? and  (5)  a resolution  and 
additional  comments  for  each  failure.  The  data  fields  such  as 
problem  number,  JSC  document  number,  CAR  number,  Boeing  page 
number,  and  source  code  are  used  only  as  reference  material. 

The  "source  code  key"  can  be  used  as  reference  to  identify  the 
source  of  the  information  contained  in  the  record. 
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The  Checkout  Failure  Data  Tables  (Appendix  B7.3-1)  represent 
records  from  the  APU  Checkout  Failure  data  file.  The  data 
fields  of  interest  are:  (1)  the  planned  mission  date  and  the 

STS  mission  ID  number;  (2)  the  APU  position,  serial  number,  and 
APU  dash  number;  (3)  the  vehicle  number;  (4)  the  component  part 
number  and  nomenclature  of  the  failed  hardware;  (5)  the  failure 
mode  and  operational  phase  during  which  the  failure  occurred 
(anomaly  phase) ; (6)  problem  description  and  comments  that  define 
the  failure  and  subsequent  resolution;  (7)  the  date  the  failure 
was  cited;  and  (8)  the  CAR  number  and  Boeing  page  number  which 
provide  e reference  to  the  information  source* 

The  data  was  sorted  according  to  part  number/name  to  display  the 
failure  modes  and  number  of  failures  per  component  observed  in 
the  historical  data  file.  The  next  step  in  the  data  development 
process  vas  to  categorize  the  data  as  discussed  in  paragraph  7.4. 


7.3.2  Mission  Related  Database  and  Output 

Early  in  the  course  of  the  study,  it  was  thought  that  ••flight'* 
run  times  of  the  APU's  should  be  tracked  separately  from  the 
ground  based  "checkout'*  times.  Two  database  files  were  generated 
to  track  and  accumulate  this  information. 

The  APU  flight  run  times  were  taken  from  the  JSC  Mission  Reports 
for  flights  1 through  23,  since  the  reports  listed  the  times 
by  mission  phase.  However,  the  mission  reports  did  not  list 
"check-out"  run  times.  The  SSM's  database  listed  these  times  of 
interest  as  Confidence  Runs  (CR) , Flight  Readiness  Firings  (FRF) , 
Launch  Aborts  (LA) , and  Hot  Fires.  This  data  included  flight  run 
times  but  was  not  divided  into  mission  phases.  Therefore,  it  was 
decided  to  use  the  SSM*s  data  for  the  prime  source  of  APU  checkout 
run  time  (CKO)  and  the  JSC  Mission  Reports  for  the  flight  (FLT) 
run  time.  Subsequent  comparison  of  the  flight  run  times  between 
the  two  sources  indicated  a difference  of  less  than  1%. 

Appendix  B7.3-2  shows  the  APU  flight  run  times  by  STS  Mission  and 
mission  phase.  The  total  run  time  for  all  AFUs  was  calculated  to 
be  6,258.96  minutes. 

Appendix  B7.3-2,  pages  7 through  9,  shows  the  checkout  run  times, 
calculated  to  be  331.19  minutes.  The  sum  of  flight  and  checkout 
run  times,  6,590.15  minutes,  was  the  basis  for  calculating 
component  failure  rates  for  an  operating  APU. 
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During  the  orbit  period,  when  an  APU  is  not  operating,  certain 
failure  scenarios  are  still  valid.  Leaks  can  occur  or  heaters 
can  fail  "on”  or  "off”.  Appendix  B7.3-2,  pages  1 and  2,  shows 
the  accumulation  of  all  mission  times  from  APU  start  to  APU 
shutdown.  This  value  of  3,671.1367  hours  represents  the 
exposure  time  of  a single  APU  to  events  such  as  leakage.  The 
total  exposure  time  used  in  this  study  was  three  times  this 
value  plus  the  time  accumulated  during  checkout,  for  a total  of 
11,018.9299  hours.  This  represents  the  total  exposure  time  of 
a flight-configured  APU  to  a leakage  environment. 

Additional  mission  related  data  is  shown  in  Appendix  B7.3-2,  page 
15,  including  mission  timing  parameters  such  as  El,  TAEM,  Touch- 
down (TD)  , and  Wheelstop  (WS).  Appendix  B7.3-2,  page  16,  shows 
additional  ascent  mission  related  data. 


7.3.3  Treatment  of  Electrical  Components 

The  APU  electronic  controller  was  treated  as  a "black  box". 
Controller  failures,  as  found  in  the  flight  history  data,  could 
be  tracked  only  to  the  point  that  a problem  required  removal  and 
replacement  of  the  controller  itself.  Excessive  effort  and  time 
would  have  been  required  to  determine  from  the  vendor  what 
individual  component (s)  within  the  box  had  caused  the  problem. 
Therefore,  failure  rates  were  estimated  for  the  box  itself, 
rather  than  for  components  within  the  box.  References  71  and  74 
were  the  source  for  development  of  the  basic  controller  failure 
rates,  along  with  information  gained  from  Reference  86. 

There  are  electrical  components  external  to  the  controller  for 
which  failure  rates  were  required.  These  components  include 
switches,  diodes,  hybrid  drivers,  and  Remote  Power  Controllers 
(RPC) . The  available  flight  history  data  did  not  reveal  any 
failures  of  these  components  in  the  APU  system.  In  cases  such 
as  this,  other  means  can  be  employed  to  estimate  failure  rates. 
MIL-HDBK-2 17D  (Reference  97)  and  NPRD3  (Reference  99)  were 
utilized  to  estimate  failure  rates  of  electrical  components. 

The  MIL-HDBK-2 17D  provides  the  raw  input  data  for  determining 
failure  rates  of  electrical  components  external  to  the  APU 
controller.  Fault  trees  (Reference  71)  were  developed  to  depict 
the  failure  scenario  between  the  components  and  the  end  items  of 
concern;  e.g.,  the  isolation  valves.  The  result  of  this 
analysis  was  used  as  one  input  to  the  development  of  probability 
distributions  for  the  failure  rate  of  electrical  components  in 
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the  risk  model.  Development  of  the  probability  distributions  is 
discussed  in  Sections  7 . 4 and  7.5. 


7.4  FAILURE  HISTORY  DATA  CATEGORIZATION 

A number  of  guidelines  and  criteria  were  established  for  the  APU 
data  categorization  task.  They  are  discussed  below. 

1.  Failures  occurring  before  January  1,  1981,  were  omitted 
from  the  data  base  on  the  grounds  that  the  APU  was  still 
undergoing  design  development  prior  to  that  time. 

2.  Failures  occurring  during  qualification  tests  (QUAL)  and 
acceptance  tests  (ATP)  were  not  included  in  the  database 
for  this  project.  These  tests  were  thought  to  be  largely 
inapplicable,  on  the  basis  that  bench  tests  of  individual 
components  or  subassemblies  might  not  reflect  the  actual 
operation  of  a completed  APU.  In  addition,  since  these 
tests  are  often  performed  early  in  the  process  of  readying 
an  APU  for  flight,  they  detect  many  types  of  failures  that 
would  not  be  expected  during  an  actual  flight. 

3,  Failures  occurring  both,  during  checkout  tests  (CKO)  and 
during  actual  flights  (FLT)  were  included  in  the  database. 

It  was  recognized  that  some  types  of  checkout  failures 
might  not  be  expected  to  occur  during  flight.  However,  it 
was  decided  to  include  checkout  data  in  the  APU  database. 

In  particular,  checkout  tests  occur  far  enough  along  in  the 
process  of  readying  an  APU  for  flight  that  they  were  judged 
to  reasonably  reflect  the  condition  of  APUs  during  flight. 

4.  All  checkout  failures  were  carefully  reviewed  to  determine 
whether  they  would  actually  be  applicable  to  flight 
situations.  In  particular,  an  attempt  was  made  to  include 
only  those  checkout  failures  that  occurred  during  hot 
fifing  of  an  APU  (as  opposed  to  bench  tests  of  individual 
components,  helium  leak  tests,  and  so  on).  However,  it 
was  not  always  possible  to  make  this  determination  from 
the  available  data.  When  in  doubt,  checkout  failures  were 
conservatively  included  in  the  database. 

5.  Failures  reported  as  having  been  detected  during  refurbish- 
ment were  not  included  in  the  data  base  unless  it  seemed 
likely  that  they  actually  occurred  during  a previous  flight. 
The  purpose  of  this  ground  rule  was  to  avoid  the  inclusion 
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of  maintenance' and  refurbishment  errors  that  were  success- 
fully detected  and  resolved  before  the  completion  of  the 
refurbishment  process.  > 

6.  Failures  arising  from  maintenance  or  refurbishment  problems 
were  included  in  the  data  base  if  they  were  detected  during 
flights  or  hot  firings  (i.e.,  if  they  were  not  successfully 
resolved  before  the  completion  of  the  refurbishment 
process) . 

7.  Incipient  failures  (e.g.,  lube  oil  contamination  or  turbine 

blade  cracking)  were  included  in  the  data  base  only  if 
their  consequences  were  judged  to  be  of  sufficient  likeli- 
hood and  severity  to  be  worth  modeling.  Examples  of  the 
types  of  incipient  failures  excluded  under  this  criterion 
are:  (1)  unusually  high  gearbox  heat  retention  that  did 

not  result  in  excessive  gearbox  temperature,  (2)  unexpected 
high  vibration  that  did  not  exceed  the  redline  level,  and 
(3)  valve  cover  leaks  that  did  not  result  in  valve  failure. 

8.  A similar  guideline  was  applied  to  components  operating 
slightly  outside  of  their  intended  specifications.  Such 
problems  were  included  in  the  database  for  the  APU  risk 
assessment  only  if  they  resulted  in  component  failure, 
interfered  with  a vital  function,  or  violated  established 
flight  rule  limits.  According  to  this  rule,  for  example, 
problems  resulting  in  turbine  speeds  below  80%  or  above 
129%  would  have  been  considered  failures.  Problems 
resulting  in  a fluctuating  turbine  speed  that  nonetheless 
remained  within  the  above  limits  would  not  have  been 
considered  failures. 

9.  Failures  of  noncritical  components  (e.g.,  temperature  trans- 
ducers or  redundant  valves)  were  included  in  the  database  if 
situations  could  be  identified  where  these  components  would 
be  important.  For  example,  failures  of  the  injector  cooling 
system  were  included  even  though  the  system  is  not  used 
during  normal  operation,  since  it  is  required  for  hot 
restart  of  an  APU. 

10.  Failures  of  some  components  that  do  not  appear  in  our  APU 
model  were  nonetheless  included  in  the  database.  This 
was  done  if  it  was  judged  that  the  failure  rates  for  these 
components  would  be  substantially  the  same  as  the  failure 
rates  for  other  components  that  were  being  modeled.  For 
example,  all  pressure  transducer  failures  were  included  in 
the  database,  even  though  only  the  gearbox  pressure  trans- 
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ducer  was  actually  modeled.  Grouping  similar  components 
in  this  manner  resulted  in  narrower  uncertainty  bounds  for 
certain  components,  by  increasing  the  amount  of  data  avail- 
able for  use  in  estimating  failure  rates. 

11.  Data  for  components  that  are  significantly  different  in 
design  and/or  operation  were  not  grouped.  For  example,  data 
for  the  isolation  valves  was  analyzed  separately  from  data 
for  the  gas  generator  valves,  since  the  gas  generator  valves 
experience  pulsing  operation.  Analyzing  such  components 
separately  ensures  that  large  amounts  of  inapplicable  data 
were  not  attributed  to  any  particular  component. 

12.  Even  data  for  very  similar  components  was  not  grouped  if 
the  components  in  question  have  different  failure  modes. 

For  example,  even  though  the  primary  and  secondary  fuel 
control  are  of  virtually  identical  design,  the  primary 
valve  is  normally  open  while  the  secondary  valve  is  normally 
closed.  Therefore,  data  for  these  two  valves  was  generally 
analyzed  separately. 

13.  The  adoption  of  corrective  actions  in  response  to  particular 
failures  was  evaluated  on  a case-by-case  basis  to  determine 
whether  the  action  in  question  would  actually  be  effective  in 
preventing  a recurrence  of  the  problem.  For  example,  major 
design  changes  such  as  the  removal  or  addition  of  a valve 
would  definitely  be  taken  into  account.  However,  for  many 
corrective  actions  (e.g. , improved  cleanliness  procedures) , 
it  was  not  possible  to  determine  with  a high  degree  of  confi- 
dence that  they  would  actually  be  successful  in  preventing  a 
recurrence  of  the  problem.  Therefore,  a more  detailed  review 
of  corrective  actions  may  prove  worthwhile  for  those  failures 
that  are  found  to  be  dominant  contributors  to  the  total  risk 
of  APU  failure. 

Based  on  the  guidelines  and  criteria  established  above,  distri- 
butions were  developed  for  the  frequencies  of  various  types  of 
components  and  component  failure  modes.  The  components  used  for 
the  APU  ascent  and  descent  models  are  specified  in  Table  7.4-1. 
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COMPONENT  CATEGORIES  CONSIDERED  IN  THE  APU  MODEL 
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COMPONENT  CATEGORY  FAI LORES  SPECIFIC  COMPONENT (8) 
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7 . 5 FAILURE  RATES 


Once  the  data  has  been  categorized  as  a basis  for  determining  the 
components  and  failure  modes  for  which  failure  rate  distributions 
will  be  needed,  the  next  step  is  to  specify  prior  distributions  for 
those  failure  rates.  After  that,  one  must  specify  the  relevant 
data  for  each  component  failure  mode  (i.e.,  the  number  of  observed 
APU  component  failures  and  the  number  of  operating  hours  and/or 
demands  to  which  each  component  was  subject) . Finally,  the  data 
must  be  combined  with  the  prior  distributions  to  yield  posterior 
distributions.  The  results  of  these  three  steps  are  presented  in 
the  sections  below. 


7.5.1  Development  of  Prior  Distributions 

A number  of  sources  were  used  as  background  information  in  develo- 
ping prior  distributions.  These  include  the  Nonelectronic  Parts 
Reliability  Data  (NPRD)  handbook,  prepared  by  the  Rome  Air  Develop- 
ment Center;  MIL-HDBK-217D  (used  for  electronic  components) ; the 
Reliability  Engineering  Data  Series  report  on  Failure  Mechanisms, 
prepared  by  the  Avco  Corporation;  NASA  operating  life  limits  for 
the  APU?  and  the  engineering  judgment  of  the  analysis  team  (based 
on  previous  risk  assessments  and  data  analyses) . 

In  many  cases,  adjustments  to  the  information  obtained  from  these 
sources  were  needed.  For  example,  many  of  the  failure  rate 
estimates  obtained  from  NPRD  were  for  components  in  aircraft  or 
ground-based  environments  rather  than  missile  environments. 
Environmental  adjustment  factors  were  judged  to  be  a reasonable 
way  to  account  for  many  of  these  differences;  factors  for  this 
purpose  were  obtained  from  the  Avco  Failure  Mechanisms  report. 

In  addition,  all  the  failure  rate  estimates  in  NPRD  are  presented 
on  a per-hour  basis  (H) , while  many  of  the  failure  rates  for  the 
APU  risk  study  were  needed  on  a per-demand  basis  (D) . In  such 
cases,  the  number  of  demands  per  hour  in  a typical  application 
was  estimated  as  a basis  for  converting  the  failure  rate  to  the 
desired  units. 

In  a few  cases,  estimates  were  not  available  from  sources  such 
as  NPRD  or  MIL-HDBK-217D,  and  the  judgment  of  the  analysis 
team  provided  little  guidance  for  the  development  of  prior 
distributions.  In  such  cases,  observed  APU  failure  experience 
was  used  in  the  development  of  the  prior  distribution.  These 
distributions  were  not  subsequently  updated,  since  the  relevant 
data  had  already  been  incorporated  into  the  prior. 
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Finally,  after  the  initial  assessment  of  prior  distributions,  the 
distributions  for  similar  components  or  related  failure  modes 
were  compared  with  each  other  as  a reasonableness  check.  For 
example,  the  failure  rates  for  different  types  of  rotating  equip- 
ment (e.g.,  the  turbine,  pumps,  and  gearbox)  were  compared  to 
assure  that  they  were  roughly  comparable,  and  that  the  assigned 
failure  rates  were  consistent  with  engineering  knowledge,  such 
as  the  differing  speeds  at  which  the  various  types  of  equipment 
operate.  Similarly,  the  rates  of  leaks  from  pump  seals,  tanks, 
and  diaphragms  were  compared  to  ensure  that  the  more  vulnerable 
components  were  assigned  the  higher  leak  rates. 

This  type  of  comparison  was  intended  to  assure  that  the  various 
failure  rates  reflected  the  correct  relative  ranking.  The 
comparison  process,  which  was  especially  important  since  many 
of  the  prior  distributions  were  based  on  different  data  sources 
and/or  different  applications,  did  result  in  the  adjustment  of 
several  distributions  to  correspond  more  closely  with  what  the 
analysis  team  considered  realistic  for  application  to  the  Space 
Shuttle. 

Table  7 . 5-1  presents  the  prior  distributions  that  resulted  from 
this  process.  For  each  distribution,  the  table  contains  the 
category  of  components  to  which  the  distribution  applies,  the 
relevant  failure  mode  or  -modes,  the  5th  and  95th  percentiles  of 
the  prior  distribution,  and  the  sources  used  in  developing  that 
prior  distribution.  Engineering  judgment  is  nearly  always  used 
in  the  development  of  distributions,  because  there  is  rarely 
enough  data  to  unambiguously  specify  a distribution. 

Virtually  all  the  prior  distributions  were  assumed  to  be  lognormal 
in  form,  as  is  common  practice  in  PRAs.  For  these  distributions, 
the  medians  can  be  found  as  the  geometric  mean  of  their  5th  and 
95th  percentiles.  The  only  exception  to  the  assumption  of  log- 
normality  is  the  conditional  frequency  of  leaks  in  the  fuel 
systems  of  additional  APUs,  given  that  one  APU  is  leaking. 

Because  the  95th  percentile  of  this  frequency  was  quite  high, 
a lognormal  distribution  would  not  have  been  reasonable;  in 
particular,  it  would  have  allowed  conditional  probabilities  of 
leaks  to  be  significantly  greater  than  1.0.  Therefore,  a beta 
distribution  was  used  for  this  parameter  instead  of  a lognormal 
distribution. 
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7.5.2  Specification  of  Failure  Data 


Once  prior  distributions  have  been  developed  for  each  category  of 
components  and  each  failure  mode,  the  next  step  is  to  specify  the 
relevant  data  for  each  category  (i.e.,  the  number  cf  observed 
component  failures  of  each  type,  and  the  number  of  operating 
hours  (H)  and/or  demands  (D)  to  which  each  component  was  subject, 
which  is  referred  to  as  exposure  data) . 

The  estimation  of  exposure  data  is  a difficult  process.  It 
requires  determination  of  whether  the  relevant  failure  mode  is 
likely  to  occur  over  time  or  on  a per-demand  basis;  whether 
the  failure  mode  can  occur  at  any  time  or  only  when  the  APU  is 
operating;  and  whether  a failure  would  likely  be  detected  if 
one  occurred.  For  example,  failures  of  some  types  of  redundant 
components  may  not  be  detectable  during  normal  APU  operation. 

As  an  illustration,  the  relevant  exposure  data  for  failure  of 
the  APU  fuel  pump  to  run  was  taken  to  be  110  hours  — the  total 
amount  of  run  time  accumulated  on  all  APUs  to  date  during  flight 
and  checkout  (CKO)  hot  firings.  For  failures  of  passive  compo- 
nents (e.g. , tank  leaks),  the  relevant  exposure  data  was  taken 
to  be  11,019  hours.  This  is  based  on  the  total  amount  of  run 
time  plus  on— orbit  time  accumulated  on  all  APUs  to  date  during 
flights,  and  also  the  small  amount  of  run  time  involved  in  CKO 
hot  firings.  Finally,  for  demand-based  failures,  the  number  of 
demands  experienced  by  a typical  component  during  flights  and 
hot  firings  was  calculated  to  be  217.  This  total  assumes  that 
the  component  in  question  experiences  exactly  one  demand  during 
each  firing  of  an  APU,  and  is  made  up  of  several  contributions: 

a.  Two  demands  (one  during  ascent  and  one  during  descent)  for 
each  of  three  APUs  on  24  missions,  for  a total  of  144  demands 

b.  One  additional  demand  for  a single  APU  on  each  of  the  24 
missions  ( for  the  on-orbit  checkout  run) , for  a total  of 
24  demands 

c.  one  demand  for  a single  APU  during  each  of  the  46  CKO  hot 
firings,  for  a total  of  46  demands 

d.  One  demand  for  each  of  the  three  APUs  during  the  confidence 
run  (prior  to  the  first  flight) , for  a total  of  three 
demands 
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Care  must  be  taken  in  applying  these  values  to  particular  compo- 
nents, however,  to  assure  that  they  are  applicable.  For  example, 
the  exposure  data  for  isolation  valves  opening  or  closing  on 
demand  was  taken  to  be  434  demands  instead  of  217  demands,  since 
there  are  two  isolation  valves  in  each  APU.  Similarly,  the  expo- 
sure data  for  the  GGVM  secondary  valve  leaking  after  successful 
closure  was  taken  to  be  only  10,909  hours  instead  of  11,019  hours, 
because  the  secondary  valve  would  have  been  open  during  the  110 
hours  of  actual  APU  operation  and  thus  could  not  have  leaked 
during  that  time.  As  a final  example,  it  was  assumed  that  there 
was  effectively  no  exposure  data  for  loss  of  the  automatic  shut- 
down signal  from  the  APU  controller,  since  a failure  leading  to 
loss  of  the  shutdown  signal  would  most  likely  have  gone 
undetected  unless  a shutdown  became  necessary  during  flight. 

Table  7.5-2  presents  the  prior  distribution  and  the  failure  and 
exposure  data  for  each  basic  event  included  in  the  analysis.  As 
can  be  seen  from  that  table,  the  prior  distributions  for  some 
events  were  not  updated  and  were  used  directly  as  posterior 
distributions,  because  all  relevant  failure  data  for  those  events 
had  already  been  used  in  developing  the  priors.  For  reference 
purposes,  Table  7.5-3  provides  descriptions  of  the  actual  failures 
indicated  in  Table  7.5-2.  This  provides  a complete  description  of 
the  information  that  was  input  to  the  Bayesian  updating  process. 


7.5.3  Development  of  Posterior  Distributions 

The  Bayesian  updating  process  was  performed  using  the  RISKMAN  4 
computer  software  on  a desktop  personal  computer.  The  results 
of  this  process  are  shown  in  Table  7.5-4.  This  table  shows  the 
mean  frequency  for  each  basic  event  (based  on  the  posterior 
distribution  obtained  from  the  Bayesian  update) , and  also  the 
5th,  50th  and  95th  percentiles. 

In  using  the  distributions  in  this  table,  one  must  keep  in  mind 
that  the  two  isolation  valves  in  the  APU  are  assumed  to  have 
identical  distributions  for  each  failure  mode,  as  are  the  three 
MPUs.  Thus,  for  example,  the  frequencies  of  the  basic  events 
BAM2H  and  BAM3H  (MPUs  two  and  three,  respectively,  fail  high  at 
start)  are  described  by  the  distribution  shown  in  Table  7.5-4  for 
the  basic  event  BAM1H  (MPU  one  fails  high  at  start) . Similarly, 
the  frequency  of  the  basic  event  BAVBO  (isolation  valve  B fails 
to  open  on  demand)  is  described  by  the  distribution  for  BAVAO 
(isolation  valve  A fails  to  open  on  demand) . 
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As  discussed  in  Section  7.0,  the  Bayesian  analysis  used  to  develop 
the  distributions  shown  in  Table  7.5-4  automatically  assigns  the 
appropriate  weights  to  the  observed  data  and  the  prior  distribu- 
tion, respectively,  based  on  the  relative  strength  of  the  two 
types  of  evidence  in  each  particular  situation.  For  example,  when 
a great  deal  of  empirical  data  is  available,  the  data  will  tend  to 
dominate  the  posterior.  Similarly,  when  relatively  little  empiri- 
cal data  is  available,  the  posterior  distribution  will  tend  to 
resemble  the  prior  distribution;  in  this  case,  the  data  is  simply 
not  strong  enough  to  override  the  information  contained  in  the 
prior  distribution. 

For  most  of  the  basic  events  shown  in  Table  7 . 5—4 , relatively 
little  failure  data  was  available  — at  most  one  or  two  observed 
failures,  and  often  none.  Therefore,  most  of  the  posterior  dist- 
ributions look  fairly  similar  to  the  priors  on  which  they  were 
based.  However,  a general  trend  can  be  seen.  In  cases  where  no 
failures  were  observed,  the  posterior  is  slightly  lower  than  the 
prior.  This  is  a result  of  the  Bayesian  inference  process,  and 
is  intuitively  reasonable.  This  effect  is  greatest  when  the  .prior 
distribution  extends  to  include  fairly  high  failure  rates,  which 
are  inconsistent  with  the  lack  of  observed  failures.  Similarly, 
in  cases  where  one  or  more  failures  were  actually  observed,  the 
posterior  distribution  is  generally  slightly  higher  than  the  prior 
distribution.  With  the  small  amounts  of  exposure  data  available 
for  most  components,  even  a single  failure  is  often  sufficient  to 
suggest  that  the  failure  rate  might  be  higher  than  is  indicated 

by  the  prior  distribution. 

# 

The  frequencies  of  a few  basic  events  were  described  by  point 
estimates  instead  of  distributions,  usually  on  the  basis  that 
their  frequencies  were  negligible  or  were  known  very  precisely. 
Most  of  these  events  were  considered  to  be  negligible  for  the 
purposes  of  this  study,  and  were  therefore  assigned  frequencies 
of  zero.  The  events  in  this  category  included  the  following; 

a.  Spurious  activation  of  the  isolation  valve  automatic 
shutdown  signal  (basic  events  PACRA  and  PACRB  during 
operation,  and  BACRA  and  BACRB  at  start-up) . This 
failure  mode  is  considered  extremely  unlikely. 

b.  A number  of  APU  start  failures,  which  were  considered 
extremely  unlikely;  BAFTN  (GN2  leakage  into  the  fuel 
tank  at  start);  BAGGS  (failure  of  the  gas  generator); 

BALFB  and  BAPFB  (plugging  of  the  inline  fuel  filter  and 
the  fuel  pump  filter) ; and  BARVO  (inadvertent  opening 
of  the  fuel  pump  relief  valve) . 
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c.  Common  cause  failure  of  two  or  more  APUs  due  to  causes 
other  than  lube  oil  blockage  (basic  event  DAOCC) . The 
frequency  of  other  common  cause  failure  modes  was 
considered  to  be  dominated  by  the  frequency  of  lube  oil 
plugging. 

d.  Common  cause  failure  of  both  fuel  control  valves  in  the 
open  position  (basic  event  TACCF) . This  is  considered 
much  less  likely  than  independent  failure  of  both  valves 
due  to  mechanical  and/or  control  problems  because  one  of 
the  valves  fails  in  the  open  position  upon  loss  of  power 
and  the  other  one  fails  closed.  The  detached  valve  seat 
single  point  failure  is  likewise  considered  to  be  of  very 
low  probability. 

e.  Gearbox  failure  due  to  loss  of  lube  oil:  basic  events 

PALLL  for  a lube  oil  line  leak  and  PAGBL  for  a gross 
gearbox  leak.  The  frequency  of  these  failure  modes  was 
considered  to  be  dominated  by  the  frequency  of  smaller 
leaks  resulting  in  repressurization,  as  modeled  by  basic 
event  PAAGL. 

f.  Basic  event  PAHSP  (high  speed  operation  selected  during 
ascent) . High  speed  operation  would  not  be  manually 
selected  by  the  shuttle  crew  during  the  ascent  phase, 
unless  at  least  one  APU  had  shut  down. 

g.  Valve  leakage  after  closure  at  the  end  of  the  ascent 
phase  (PAVAZ  and  PAVBZ  for  the  isolation  valves,  and 
PASVZ  for  the  secondary  valve) . Fuel  depletion  due  to 
valve  leakage  after  closure  is  modeled  in  Stage  A 
(ascent) , but  is  quantified  in  Stage  B (orbit) . 

h.  Failure  of  the  water  spray  boiler  (basic  event  PAWSB) . 

The  water  spray  boiler  is  out  of  scope  for  this  analysis, 
and  is  included  in  the  fault  trees  only  for  completeness. 

Additional  events  that  were  assigned  point  estimates  other  than 

zero  are  as  follows: 

a.  Basic  event  DAPAT.  This  is  a correction  factor 

reflecting  the  fact  that  if  a spurious  shutdown  occurs, 
only  two  APUs  instead  of  three  may  be  subject  to  turbine 
overspeed  and  other  failure  modes.  This  basic  event 
was  conservatively  assigned  a value  of  1.0,  which  is 
equivalent  to  ignoring  the  correction  factor. 
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Basic  event  PAOSK.  Successful  functioning  of  the  overspeed 
shutdown  circuitry  and  successful  closure  of  the  secondary 
valve.  This  event  was  assigned  a likelihood  of  1.0  in 
the  PA  and  PB  fault  trees  because  the  chance  of  failure  is 
extremely  small;  a value  of  1.0  is  a highly  accurate  approxi- 
mation of  the  probability  of  success. 

The  likelihood  that  automatic  shutdown  is  enabled  (basic 
event  SAPEQ) . The  frequency  of  this  condition  is  assumed  to 
be  1.0  in  cases  where  no  other  APU  failures  have  occurred, 
since  automatic  shutdown  would  not  be  disabled  in  the  absence 
of  a failed  APU. 

An  order  correction  factor  (basic  event  PALKF)  for  the  condi- 
tional probability  that  a gearbox  leak  occurs  subsequent  to 
the  failure  of  a component  needed  to  respond  to  the  leakage 
(e.g.,  the  GN2  valve).  This  order  correction  factor  was 
assumed  to  equal  0.5. 

Order  correction  factors  for  the  sequencing  of  spurious  shut- 
downs and  other  APU  failures.  The  likelihood  that  the 
spurious  shutdown  would  have  occurred  first  (basic  event 
SASSD)  was  assumed  to  equal  0.5.  The  likelihood  of  the  other 
failure  occurring  first  (basic  event  SAOFO)  was  taken  to  be 
one  minus  the  frequency  of  SASSD  (i.e.,  also  equal  to  0.5) . 

The  conditional  probability  that  a fuel  system  leak  occurs 
upstream  of  the  isolation  valves  (basic  event  CIUSL) . This 
was  estimated  to  equal  0.3,  based  on  a ratio  of  the  frequency 
of  tank  leaks  to  the  total  frequency  of  all  fuel  system  leaks. 

The  conditional  probability  that  a fuel  system  leak  occurs 
downstream  of  the  isolation  valves,  but  upstream  of  the 
secondary  valve  (basic  event  C1DSL) . This  was  estimated  to 
equal  0.5  based  on  the  locations  of  the  observed  fuel  system 
leaks  to  date. 
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Prior  distribution  not  updated,  observed  data  already  incorporated. 
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Prior  distribution  not  updated,  observed  data  already  incorporated. 
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Prior  distribution  not  updated,  observed  data  already  incorporated. 
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TBSVS  Over/underspeed  Fail  on  at  start 
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Prior  distribution  not  updated,  observed  data  already  incorporated. 
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This  failure  rate  was  multiplied  by  1.5  in  the  split  fraction  equations,  to  reflect  the 
fact  that  repressurization  was  required  twice  instead  of  once  in  about  half  of  the  observed 
instances  of  gearbox  leakage,  for  an  average  of  1.5  demands  per  leak. 
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Thls~failure  rate  was  multiplied  by  3 in  the  split  fraction  equations,  to  reflect  the  fact 
thf the  Isolation  valves  have  3 drivers  in  series. 
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7.6  SPATIAL  INTERACTIVE  EVENT  DATA  DEVELOPMENT 


Based  on  the  discussion  of  Section  6.6,  two  types  of  spatial 
interactive  events  (SIEs)  were  identified  as  significant  for 
development  of  probability  distributions. 

a.  Events  related  to  APU  turbine  breakup 

b.  Events  related  to  APU  fuel  (hydrazine)  leakage 

Each  SIE,  to  be  a meaningful  input  to  the  PRA,  must  be  defined 
as  a conditional  probability  and  described  in  the  probability 
of  frequency  format.  However,  the  frequencies  associated  with 
SIEs  are  less  amenable  to  direct  calculation  than  are  those 
associated  with  component  failures,  for  which  failure  history 
data  is  available.  The  approach  to  developing  the  probability 
distributions  was  to  collect  and  analyze,  to  the  greatest  extent 
possible,  all  information  relevant  to  the  SIEs.  Examples  of 
available  sources  were  drawings,  test  reports,  formal  and 
informal  analyses,  and  telecons.  Candidate  probability 
distributions  were  then  proposed  using  the  assembled  and 
analyzed  data. 

A group  of  systems  experts,  hereinafter  referred  to  as  the 
"Group",  whose  function  was  discussed  in  Section  5.10,  was 
assembled  to  review  the  most  significant  SIEs  and  propose 
probability  distributions  as  a group. 

There  were  some  instances  where  test  data  or  analyses  were 
available,  but  time  constraints  did  not  allow  adequate  melding 
of  opinion  and  the  available  data.  For  those  cases,  the  final 
probability  distribution  represents  a judicious  weighing  of  the 
two  sets  of  inputs. 

Table  7.6-1  presents  the  split  fractions  required  for  input 
into  the  APU  PRA. 


7.6.1  SIE  Data  Related  to  APU  Turbine  Breakup 

The  following  paragraphs  present  the  probability  of  frequency 
distributions  developed  to  represent  the  conditional  prob- 
abilities related  to  APU  turbine  breakup,  and  discuss  the  data 
that  support  these  distributions. 
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TABLE  7.6-1 


APU  8PL1T  FRACTIONS  FOR  SIEs 


Name 

Split  Fraction 

FI 

Pr  {APU  Turbine  Fail  | Primary  and  Secondary  Valves 
Fail  Open) 

F3 

Pr  (Uncontained  Shrapnel  | Turbine  Breakup  Due  to 
Overspeed ) 

F3N 

Pr  (Uncontained  Shrapnel  | Turbine  Breakup  at  Normal 
Speed ) 

F5 

Pr  (Failure  of  Second  APU  or  Flight  Control  Equipment 
(FCE)  | Shrapnel  Due  to  Turbine  Breakup  at  Overspeed) 

F5N 

Pr  (Failure  of  Second  APU  or  FCE  | Shrapnel  Due  to 
Turbine  Breakup  at  Normal  Speed) 

F7 

Pr  (Fuel  Leak  | Uncontained  Shrapnel  from  Second  APU) 

F12 

Pr  (APU  fail  | Small  Leak  in  That  APU 

F13 

Pr  (APU  Fail  | Small  Leak  in  Another  APU 

F15 

Pr  (2  APUs  or  FCE  Fail  | Small  Leak  in  One  of  the 
Two  APUs 

F17 

Pr  (2  APUs  or  FCE  Fail  | Small  Leaks  in  at  Least  Two 
APUs 

7. 6. 1.1  Probability  of  Turbine  Breakup  at  Normal  Speed 

The  probability  of  turbine  breakup  at  normal  speed  was  developed 
from  military  data  (NPRD-3) , Reference  99,  as  modified  by 
environment  factors  appropriate  to  missile  launches.  This  prior 
probability  distribution  is  shown  in  the  previous  paragraph, 
Table  7.5-2.  The  actual  failure  history  of  APU  turbines  (zero 
failures  in  110  hours)  was  then  used  as  the  likelihood  to  arrive 
at  the  posterior  distribution  also  presented  in  the  previous 
paragraph,  Table  7.5-4,  via  the  use  of  Bayes'  Theorem. 

The  information  used  for  NPRD-3  is  a compilation  from  diverse 
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small,  high  speed  turbines  used  in  military  applications.  The 
information  did  not  provide  the  fraction  of  turbine  failures 
which  actually  yielded  turbine  hub  breakup.  Therefore,  a large 
uncertainty  was  assigned  to  the  NPRD  value  to  account  for  the 
applicability  of  the  NPRD  source  to  APU  turbine  breakup.  The 
distribution  used  for  this  study  was  a log-normal  with  a 5th 
percentile  of  2.7  x 10"5  failure/hour  and  a 95~h  percentile  of 
1.8  x 10”3  failure  hour.  This  distribution  was  used  in  the 
evaluation  of  Top  Events  PA,  PB  and  DB. 


7. 6. 1.2  Probability  of  Turbine  Breakup  Due  to  Overspeed 

The  Group  discussed  the  likelihood  that  a turbine  breakup  would 
result  from  an  overspeed  induced  by  failure  of  both  the  primary 
and  secondary  fuel  control  valves  in  the  open  position.  If  the 
valves  fail  open,  a high  speed  shutdown  should  be  commanded  with 
automatic  closing  of  the  fuel  isolation  valves.  This  closing 
would  limit  the  supply  of  fuel  to  the  turbine  and  thus  limit 
the  peak  rotation  rate.  It  is  not  certain,  however,  that  this 
closing  will  prevent  breakup  due  to  overspeed,  since  overspeed 
conditions  are  reached  in  approximately  300  milliseconds,  and 
the  fuel  line  downstream  of  the  isolation  valve  contains 
sufficient  hydrazine  for  2 seconds  of  operation  (about  7 pulses) . 
In  view  of  the  above,  the  Group  judged  that  if  the  fuel  control 
valves  fail  open,  there  is  a very  high  probability  that  the 
turbine  will  break  up.  This  is  expressed  in  the  following 
discrete  probability  distribution. 


Pr(fl) 


fl 


.5 


i 1 r 

0 .1  .2 


.3 


—r 

.4 


— r 

.5 


.6 


.7 


.2 


.8  .9  1.0 


Frequency  of  Occurrence  (fl) 


Pr  (fx) 

.1 

.2 

.5 

.2 

fl 

.65 

in 

• 

.85 

.95 
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This  distribution  was  used  in  the  evaluation  of  event  tree  Top 
Events  CA  and  CB  following  occurrence  of  TA  and  TB  respectively. 


7. 6. 1.3  Probability  of  Uncontained  APU  Shrapnel  as  a 
Consequence  of  Turbine  Breakup  at  Overspeed 

The  probability  of  having  uncontained  fragments  as  a result  of 
a turbine  breakup  is  determined  by  the  expected  breakup  speed 
and  by  the  ability  of  the  APU  structure  to  contain  fragments 
at  the  expected  energy  levels. 

There  have  been  four  incidents  of  turbine  hub  breakup  during 
testing.  These  are  shown  in  Table  7.6-2.  In  each  case,  breakup 
resulted  from  overspeed. 


TABLE  7.6-2 

TURBINE  HUB  BREAKUP  DATA 


Unit  Test 

Actual 

Breakup  Speed 

S/N  003 

107,520  rpm 

(Unnotched) 

(149.3%) 

S/N  106 

112,600  rpm 

(Unnotched) 

(156.0%) 

S/N  105 

84,240  rpm 

(Notched) 

(117%) 

Rig  Test 

99,700  rpm 

(Drilled) 

(138.5%) 

The  data  from  three  of  the  four  turbine  hub  fragmentation 
incidents  were  utilized  to  estimate  the  turbine  hub  breakup 
likelihood  as  a function  of  turbine  speed  (Reference  61) . 

The  S/N  105  unit  breakup  speed  was  adjusted  to  estimate  the 
unnotched  breakup  speed.  The  rig  test  data  was  not  included 
since  no  information  was  available  about  the  configuration 
or  test  conditions. 
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The  results  of  this  analysis  of  test  data  indicate  that  a mean 
turbine  hub  breakup  speed  of  108,000  rpm  (150%)  and  a standard 
deviation  of  4,267  rpm  should  be  used  in  evaluating  the  effects 
of  a turbine  breakup  due  to  overspeed.  This  analysis  assumed 
that  breakup  speed  is  normally  distributed  and  that  unit  S/N 
105,  as  modified,  is  a valid  data  point  for  the  analysis.  It 
ignores  the  effect  of  life  cycles  on  breakup  speed. 

Reference  25  presents  calculations  to  estimate  APU  turbine 
overspeed  required  to  burst  the  containment  ring  and  produce 
shrapnel.  Estimated  speed  for  this  event  was  96,900  rpm  or 
134.6%  of  normal  operating  speed.  Thus,  the  likelihood  of  APU 
fragments  being  uncontained  is  the  likelihood  that  the 
fragmentation  speed  will  exceed  96,900  rpm. 

The  group  of  systems  experts  discussed  the  information  relating 
to  the  fragmentation  incidents  and  agreed  that  a breakup  due  to 
an  APU  overspeed  would  certainly  produce  uncontained  shrapnel, 
and  therefore  assigned: 


P(f3) 

1.0 

f 3 

1.0 

This  value  was  used  in  the  evaluation  of  event  tree  Top  Events 
CA  and  CB  following  occurrence  of  TA  and  TB,  respectively. 


7. 6. 1.4  Probability  of  Uncontained  APU  Shrapnel  as  a 

Consequence  of  Turbine  Breakup  at  Normal  Speed 


The  information  presented  in  paragraph  7. 6. 1.3  is  also  valid 
for  assessing  the  effects  of  turbine  breakup  at  normal  speed. 
However,  even  though  unit  S/N  105  broke  up  at  a speed  below 
that  required  to  burst  the  containment  ring,  fragments  bypassed 
the  containment  ring  and  exited  through  the  APU  housing.  This 
was  attributed  to  the  effects  of  notches  in  the  turbine  hub 
(Reference  96) . The  Group,  in  considering  this  failure,  judged 
that  any  turbine  that  broke  up  at  normal  speed  would  have  to  be 
seriously  flawed  and,  hence,  assigned  the  following  discrete 
probability  distribution. 
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.50 
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This  distribution  was  used  in  evaluating  event  tree  Top  Events 
CA  and  CB  following  occurrence  of  PA  and  PB,  respectively. 


7. 6. 1.5  Probability  of  a Second  APU  or  Flight  Critical 

Equipment  Failure  as  a Consequence  of  Uncontained 
Shrapnel  from  a Turbine  Breakup  at  Overspeed 

Given  uncontained  shrapnel  from  a turbine  overspeed,  the 
likelihood  that  this  shrapnel  would  cause  a second  APU  or  flight 
critical  equipment  to  fail  is  determined  by  three  factors : the 

energy  level  of  uncontained  shrapnel,  the  likelihood  of  an  uncon- 
tained fragment  striking  the  equipment,  and  the  vulnerability  of 
the  equipment. 

The  energy  of  the  uncontained  fragments  can  be  estimated  as  the 
energy  of  the  turbine  hub  fragments  minus  the  minimum  energy 
required  to  burst  the  containment  ring.  The  energy  of  the 
turbine  hub  fragments  was  estimated  by  extending  the  calculations 
in  Reference  25.  It  was  assumed  that  the  hub  would  break  into 
three  120*  segments,  each  weighing  approximately  0.9  pounds. 

This  breakup  pattern  is  consistent  with  observed  test  failures 
(Reference  96) . 

Using  the  approach  of  Reference  25,  the  minimum  energy  required 
to  burst  the  containment  ring  is  calculated  to  be  19,359  lb-ft. 
The  energy  of  APU  turbine  fragments  and  the  energy  of  resulting 
uncontained  fragments  at  various  speeds  are  presented  in  Table 
7.6-3. 
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TABLE  7.6-3 


AFU  UNCONTAINED  FRAGMENT  ENERGIES 


Oper . 

Fragment 

Uncontained 

Speed 

W 

Energy 

Frag . Enercy 

<%) 

(rpm) 

(lb-ft) 

(lb-ft) 

100 

72,000 

10,688 

0 

110 

79,200 

12,932 

0 

120 

86,400 

15,390 

0 

130 

93,600 

18,063 

0 

140 

100,800 

20,948 

1,589 

150 

108,000 

24,048 

4,689 

160 

115,200 

27,361 

8,002 

170 

122,400 

30,888 

11,529 

180 

129,600 

34,629 

15,270 

190 

136,800 

38,584 

19,225 

200 

144,000 

42,752 

23,393 

The  likelihood  of  an  uncontained  fragment  striking  a given  piece 
of  equipment  must  account  for  the  fragment  spray  pattern  and  the 
location  of  the  equipment  in  the  aft  compartment  relative  to  the 
fragmentation  source. 

The  fragment  spray  pattern  that  would  result  from  an  uncontained 
APU  turbine  fragmentation  is  difficult  to  define  analytically 
because  of  the  random  nature  of  the  particle  paths,  the  lack 
of  failure,  data  and  the  complex  APU  containment  ring  geometry. 
Limited  test  data  documentation,  Reference  96,  states  that: 

"The  pieces  of  the  turbine  wheel  (hub)  that  were  not 
contained  by  the  [turbine]  housing  exited  in  a radial 
direction.  In  the  S/N  106  burst,  the  section  of  wheel 
found  in  the  aluminum  heater  panel  exited  within  15  * 
of  the  true  radial." 

Based  on  this  limited  data,  the  possibility  of  fragments  striking 
equipment  at  angles  of  at  least  15*  above  or  below  the  plane  of 
the  turbine  wheel  must  be  considered. 

The  expectation  that  the  turbine  will  fragment  into  three  120*  seg- 
ments means  that  there  is  a 100%  likelihood  that  any  120*  arc  of 
the  x-y  plane  will  contain  one  fragment.  The  likelihood  of  a given 
fragment  direction  within  the  120*  arc  is  uniformly  distributed. 
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To  assess  the  likelihood  of  a fragment  striking  a piece  of  equip- 
ment, a spatial  analysis  (Reference  61)  was  performed.  This 
analysis  considered  the  item's  location  above  and  below  the  plane 
of  the  turbine  and  its  profile  area.  Table  7.6-4  lists  equipment 
considered  susceptible  to  being  struck  by  APU  fragments  within 
±15*  of  the  turbine  plane  of  rotation. 

TABLE  7.6-4 

POTENTIAL  TARGETS  OP  APU  TURBINE  FRAGMENTS 


EQUIPMENT  APU  1 

APU  2 

APU  3 

LEFT  OMS  OXID  TK  X 

X 

X 

LEFT  OMS  FU  TK 

X 

X 

RT  OMS  OXID  TK  X 

X 

X 

RT  OMS  FU  TK  X 

X 

LEFT  RCS  OXID  TK  X 

X 

X 

RT  RCS  OXID  TK  X 

X 

X 

RT  RCS  FU  TK 

X 

APU  1 FU  TK  X 

X 

APU  2 FU  TK 

X 

APU  3 FU  TK  X 

X 

X 

AVIONICS  BAY  4 

X 

AVIONICS  BAY  5 X 

HYD  RES  3 X 

X 

APU  1 FU  LINE  X 

X 

APU  2 FU  LINE  X 

X 

APU  3 FU  LINE 

X 

APU  1 LUBE  LINE  X 

X 

APU  2 LUBE  LINE 

X 

APU  3 LUBE  LINE 

X 

APU  1 EXHST  DUCT 

X 

SYS  1 HYD  LINE  X 

X 

X 

SYS  2 HYD  LINE  X 

X 

X 

SYS  3 HYD  LINE  X 

X 

X 

SSME  L02  MANIF 

SSME  1 L02  FDLINE  X 

X 

X 

SSME  2 L02  FDLINE 

SSME  3 L02  FDLINE 

SSME  LH2  MANIF  X 

X 

SSME  1 LH2  FDLINE  X 

X 

X 

SSME  2 LH2  FDLINE  X 

X 

X 

SSME  3 LH2  FDLINE 

X 

WIREBUNDLES  X 

X 

X 
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This  table  also  lists  "lines"  and  virebundles  that,  because  of 
their  location,  are  considered  potential  targets.  Because  of  their 
complex  geometry,  a subjective  assessment  of  their  susceptibility 
was  performed  instead  of  a quantitative  spatial  analysis. 

Given  the  fragment  spray  pattern  described  above  and  the  proxi- 
mity of  the  APUs  to  the  1307  bulkhead  and  sidewalls,  it  must  be 
considered  that  there  is  a very  high  likelihood  than  an  uncon- 
tained fragment  will  strike  either  the  bulkhead  or  a sidewall. 

To  assess  the  vulnerability  of  equipment,  it  is  necessary  to 
consider  the  penetration  capability  of  the  fragments  relative 
to  the  characteristics  of  the  target  equipment.  Reference  105 
discusses  the  effects  of  fragmentation  of  jet  engine  turbines. 

The  penetration  capability  of  uncontained  APU  fragments  was 
estimated  by  using  the  analysis  approach  of  Reference  105  and 
the  uncontained  fragment  energies  presented  in  Table  7.6-3 
above.  All  fragments  were  assumed  to  strike  perpendicular  to 
the  surface.  The  penetration  depth  in  common  material  for 
various  overspeeds  is  shown  in  Table  7.6-5  below. 


TABLE  7.6-5 


PENETRATION  CAPABILITY  07  UNCONTAINED  FRAGMENTS 


Overspeed 

(%) 

W 

(r pm) 

Penetration  Depth 
(Inches) 

A1 

Ti 

Steel 

130 

93,600 

man 

m 

wmm 

140 

100,800 

B8S 

yi 

KBS 

150 

108,000 

ISffl 

WEBSm 

160 

115,200 

0.80 

119 

Him 

170 

122,400 

0.96 

0.44 

0.39 

180 

129,600 

1.11 

0.51 

0.45 

190 

136,800 

1.24 

0.57 

0.50 

200 

144,000 

1.37 

0.68 

0.55 

A1  « Aluminum 


Ti  * Titanium 
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The  physical  dimensions  of  selected  equipment  installed  in  the 
aft  compartment  are  presented  in  Table  7.6-6.  As  may  be  seen 
by  comparing  the  data  of  Table  7.6-6  with  those  of  Table  7.6-5, 
the  theoretical  penetration  capability  of  uncontained  fragments 
significantly  exceeds  the  panel  thickness  of  the  critical  equip- 
ment in  Table  7.6-6,  as  well  as  portions  of  the  1307  bulkhead 
and  aft  compartment  side  walls. 

TABLE  7.6-6 

WALL  THICKNESSES  OP  SELECTED  EQUIPMENT 


Equipment 

Diameter 

(Inches) 

wall 

Thickness 

(Inches) 

Material 

Hydraulic  Pressure 
Lines 

1-1/4  to 
1/4 

.065  to  .020 

Titanium 

Hydraulic  Return 
Lines 

7/8  to  1/4 

.026  to  .020 

Titanium 

MPS  17"  LH2  Line 

17 

.040 

Inconel  718 

Vacuum  Jacket 

.025 

CRES  321 

MPS  LH2  Manifold 

— 

.063 

Inconel  718 

Vacuum  Jacket 

.040 

CRES  321 

MPS  12"  LH2  Line 

12 

.032 

Inconel  718 

Vacuum  Jacket 

.040 

CRES  321 

MPS  17"  L02  Line 

17 

.050 

Inconel  718 

Vacuum  Jacket 

.040* 

CRES 

MPS  L02  Manifold 

— 

.080 

Inconel  718 

Vacuum  Jacket 

Foam 

CRES  321 

MPS  12"  L02  Line 

12 

.050 

Inconel  718 

Vacuum  Jacket 

.040* 

CRES  321 

APU  Fuel  Lines 

1/2 

.025 

Stnls  Steel 

APU  GN2  Lines 

3/8 

.020 

Stnls  Steel 

1307  Bulkhead  Pnl 

.050 

Aluminum 

Aft  Comprt  Sdwls 

.136  to  .070 

Aluminum 

OMS  Deck  Panels 

.070  & .063 

Aluminum 

OMS  Tanks 

.0759 

Titanium 

Avionics  Bays 

1"  Honeycomb 

Aluminum 

.020  Facing  Sh 

Aluminum 

*OV  102  only.  L02  lines  are  foam  insulated  on  other  vehicles. 
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In  a discussion  with  a representative  of  the  JSC  Materials 
Technology  Branch,  some  test  results  were  obtained  in  which 
a 1 inch  long  cylindrical  object  was  projected  at  a .05  inch 
thick  panel  similar  to  the  1307  bulkhead  panels.  The  panel  was 
penetrated  by  .5  lb.  projectile  traveling  at  100  ft/sec.  The 
energy  of  the  projectile  was  77.7  lb-ft,  which  is  relatively 
small  by  comparison  with  the  estimated  energies  for  turbine 
wheel  fragments  in  Table  7.6-3.  These  test  results  were  not 
yet  published  at  the  time  this  study  was  conducted. 

Based  on  the  theoretical  estimates  and  the  limited  amount  of 
supporting  test  data,  it  must  be  concluded  that  all  of  the 
equipment  items  in  Table  7.6-4  are  vulnerable  to  being  damaged 
if  struck  by  uncontained  fragments  of  an  APU  turbine  hub.  A 
comprehensive  quantitative  analysis  pinpointing  exact  dimensions 
and  impact  angles  is  not  within  the  scope  and  budget  of  this 
study . 

In  view  of  the  significant  number  of  critical  equipment  items 
subject  to  damage,  and  the  complex  geometry  of  the  aft  compart- 
ment, it  must  be  concluded,  for  purposes  of  this  PRA,  that  an 
uncontained  turbine  hub  breakup  has  a very  high  probability  of 
causing  loss  of  a second  APU  or  Flight  Critical  Equipment  (FCE) 
and,  consequently,  crew  and  vehicle. 

The  likelihood  of  puncturing  the  External  Tank  is  estimated  to  be 
zero.  The  centerline  of  the  External  Tank  is  parallel  to  the 
plane  of  rotation  of  the  turbine.  A particle  on  a trajectory  15* 
below  the  plane  of  turbine  would  have  to  penetrate  a significant 
portion  of  the  Orbiter  to  strike  the  External  Tank,  including  the 
1307  Bulkhead,  the  payload  bay  (and  payloads)  and  finally,  the 
Orbiter  skin  and  tiles. 

The  Group  discussed  the  likelihood  that  uncontained  shrapnel  from 
a turbine  overspeed  would  cause  a second  APU  or  flight  critical 
equipment  to  fail.  In  addition  to  failures  caused  by  the  shrapnel 
per  se,  the  group's  considerations  included  the  failure  of  another 
APU  or  FCE  as  a result  of  the  fuel  leakage  that  would  accompany 
the  breakout  of  shrapnel. 

It  was  recognized  that  the  conditional  probabilities  would  differ 
between  ascent  and  descent.  For  example,  during  ascent  the  pre- 
launch nitrogen  purge  in  the  aft  compartment  precludes  combustion 
from  hydrazine  sources.  The  main  engine  feedlines  flowing  large 
quantities  of  liquid  oxygen  and  liquid  hydrogen  represent  a severe 
threat  if  struck  by  shrapnel.  However,  during  descent  the  main 
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engine  feedlines  are  inert,  but  air  not  available  during  ascent, 
enters  the  aft  compartment,  providing  an  environment  which  will 
support  combustion.  This  is  relevant  to  the  possibility  of  fire 
resulting  from  a hydrazine  leak. 

After  considering  the  factors  discussed  in  the  initial  para- 
graphs above,  it  was  decided  to  use  the  following  probability 
of  frequency  distributions  for  ascent  and  descent. 
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This  distribution  was  used  in  the  evaluation  of  Event  Tree  Top 
Event  CA  ft f ter  the  occurrence  of  TA. 
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This  distribution  was  used  in  the  evaluation  of  event  tree  Top 
Event  CB  after  the  occurrence  of  TB. 


! 
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7. 6. 1.6  Probability  of  a Second  APU  or  Flight  Critical 

Equipment  Failure  as  a Consequence  of  Uncontained 
Shrapnel  from  a Turbine  Breakup  at  Normal  Speed 

The  data  from  Table  7.6-3  above  indicates  that  the  energy  from 
a breakup  at  normal  speed  would  not  be  high  enough  to  break  the 
containment  ring.  However,  as  explained  in  Section  7. 6. 1.4, 
the  Group,  based  on  the  S/N  105  test,  judged  that  there  was  a high 
probability  that  shrapnel  would  bypass  the  containment  ring  and 
break  through  the  APU  housing.  In  considering  this  conditional 
probability,  they  also  recognized  that  the  shrapnel  would  be  of 
a lower  energy  level  and  assigned  the  following  distributions. 

Ascent  - f 5N 


.3 


in 

<M 

• 

.25 

P ( f 5N)  .1 

.1 

1 1 ...... 

1 1 

1 i 

o .1  .: 

2 .: 

j .< 

1 

l .! 

5 • < 

m m 

3 

7 .1 

1 1 
3 .9  1. 

Frequency  of  Occurrence 


P ( f 5N) 

.1 

.25 

.3 

.25 

.1 

f 5N 

.1 

.3 

.5 

.7 

.9 

This  distribution  was  used  in  the  evaluation  of  Top  Event  CA 
after  occurrence  PA. 
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P ( f 9N) 

.15 

.35 

.30 

.15 

.05 

f 9N 

.1 

.3 

.5 

.7 

.9 

This  distribution  was  used  in  the  evaluation  of  CB  following  the 
occurrence  of  PB. 


7. 6. 1.7  Probability  of  a Hydrazine  Leak  as  a Consequence 
of  Uncontained  Shrapnel  From  Another  APU 

The  Group  considered  this  conditional  probability  to  include 
the  possible  effect  of  a fire  which  would  certainly  accompany 
a turbine  breakup.  STS-9  dramatically  illustrated  how  a leak 
and  fire  from  one  APU  can  affect  a second  APU.  Thus,  while  the 
probability  of  an  APU  being  struck  by  a fragment  is  the  same 
for  ascent  and  descent,  the  likelihood  of  a fuel  fire  on  descent 
makes  damage  more  likely  during  that  phase.  The  Group  assigned 
the  following  probability  distributions: 
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This  distribution  was  used  in  the  evaluation  of  event  tree 
Top  Event  FA  after  the  occurrence  of  TA  without  the  occurrence 
of  CA. 
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In  the  event  of  a large  hydrazine  leak  during  entry,  (e.g.,  the 
contents  of  an  APU  fuel  tank  leaks  into  the  aft  compartment)  the 
experts  surveyed  believe  that  a large  fire  would  result,  leading 
to  loss  of  the  crew  and  vehicle. 

APU  fuel  leakage  may  ultimately  lead  to  hydrazine  detonation. 
Hydrazine  explodes  when  heated  to  a detonation  temperature 
determined  by  the  surface  material  in  contact  with  the  hydrazine. 
The  detonation  temperature  may  be  achieved  by  heating  the 
hydrazine  or  by  adiabatically  compressing  hydrazine  containing 
vapor  bubbles. 

Under  adiabatic  compression  conditions,  the  threshold  temperature 
limit  for  explosive  decomposition  of  hydrazine  was  between  217 *F 
and  195 *F  in  containers  made  of  CRES-321,  Hastelloy-X,  Haynes-25, 
CRES-316,  CRES-347  and  CRES-304L.  These  metals  are  listed  in  order 
of  decreasing  threshold  temperature  (Reference  92) . Without  comp- 
ression, liquid  hydrazine  will  explode  at  high  temperature.  Tests 
performed  at  the  NASA  White  Sands  Test  Facility  indicare  that 
liquid  hydrazine  in  a test  chamber  constructed  of  304  stainless 
steel  may  explode  at  temperatures  above  about  445*F  (Reference  88). 


7. 6. 2.1  Probability  of  APU  Failure  Given  a Small  Fuel  Leak 
in  That  APU 

Several  mechanisms  contribute  to  the  event  of  an  APU  failure 
given  a small  fuel  leak  in  that  APU.  A small  contribution  to 
the  frequency  of  this  event  arises  from  APU  fuel  leaking  into 
the  solenoid  cavity  of  a fuel  isolation  valve  or  gas  generator 
control  valve  and  detonating,  thereby  interrupting  usage  of  the 
valve.  A second  small  contributor  to  this  event  is  leakage  by 
means  of  breakage  of  the  carbon  face  of  an  APU  fuel  pump  seal, 
allowing  the  hydrazine  to  detonate  because  of  metal  nibbing 
against  metal,  and  thereby  causing  the  fuel  pump  to  fail. 

Aside  from  fire,  two  additional  mechanisms  are  relevant  to  hydra- 
zine leakage  into  the  aft  compartment.  These  are:  (1)  stripping 

of  Kapton  electrical  wiring  insulation  leading  to  loss  of  GGVM 
control,  and  (2)  hydrazine  decomposition  initiated  by  some 
catalyst  in  the  environment.  Hydrazine  is  known  to  dissolve 
Kapton.  However,  the  Group  could  not  agree  as  to  whether  the 
APU  Kapton  wiring  insulation  would  be  removed  in  the  time 
interval  (about  8 minutes)  between  APU  startup  and  attaining 
an  altitude  at  which  hydrazine  cannot  exist  as  a liquid.  With 
regard  to  hydrazine  decomposition,  it  is  known  that  catalysts 
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exist  which  cause  hydrazine  to  decompose  at  room  temperature. 

An  example  of  such  a reaction  is  taken  from  Reference  91. 

"When  two  or  three  drops  of  hydrazine  were  dropped  onto 
a layer  of  ferric  oxide  ...  at  the  temperature  of  the 
laboratory  ...  in  a nitrogen  atmosphere,  sparking 
occurred,  and  the  oxide  became  red  hot,  but  flame  did 
not  appear  ..." 

Again  the  Group  differed  in  their  opinions  regarding  the  possible 
presence  of  such  a catalyst  in  the  Orbiter  aft  compartment. 

After  consideration  of  the  opinions  offered,  the  following 
assignment  was  made  for  the  distribution  associated  with  APU 
failure  given  a small  fuel  leak  in  that  APU  during  ascent: 


Pr (fl2A) 

.15 

.35 

.35 

.15 

f 12A) 

.05 

.15 

.25 

.35 

This  distribution  was  used  in  the  evaluation  of  event  tree  Top 
Events  Cl,  C2  and  C3  when  no  APUs  had  failed,  and  in  the 
evaluation  of  FA  if  an  APU  had  previously,  failed. 

Considering  the  increased  risk  due  to  fire  on  descent,  the 
following  assignment  was  made  for  the  split  fraction  associated 
with  APU  failure  given  a small  fuel  leak  in  that  APU  during 
descent: 


Pr (fl2D) 

.1 

.2 

.3 

.25 

.10 

.05 

f 12D 

.4 

.5 

.6 

.7 

.8 

.9 

This  distribution  was  used  in  the  evaluation  of  event  tree  Top 
Events  Dl,  02,  and  D3  when  no  APUs  had  failed  or  in  the  evaluation 
of  FB  if  an  APU  had  previously  failed. 


7. 6. 2. 2 Probability  of  APU  Failure  Given  a Small  Fuel  Leak  in 
Another  APU 

The  probability  of  APU  failure  given  a small  fuel  leak  in  another 
APU  was  agreed  by  the  Group  to  be  much  less  during  ascent  than 
the  probability  of  APU  failure  given  a small  fuel  leak  in  that 
APU.  Internal  fuel  leakage  in  another  APU  was  considered  to  pose 
a reduced  risk  since  the  resulting  detonation  could  produce,  at 
most,  low  energy  shrapnel. 
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8.0  QUANTITATIVE  RESULTS  OF  THE  ABZ-EB& 

The  PRA  model  was  constructed  from  the  tcp  down.  The  analysis 
!'  started  with  the  major  functions  of  the  shuttle,  loss  of  which 

would  cause  loss  of  mission  or  loss  of  crew  and  vehicle.  This 
failure  logic  was  documented  in  the  Master  Logic  Diagrams.  Those 
diagrams  were  developed  to  the  level  of  initial  failure  categories 
of  the  APU  that  could  lead  to  the  damage  states  IOC/V,  intact 
abort,  PLS , or  launch  scrub.  By  means  of  event  sequence  diagrams, 

1 all  significant  scenarios  that  could  lead  from  an  initial  failure 

li  to  one  of  the  damage  states  were  defined  and  described.  The  event 

trees  and  split  fraction  models  provided  further  detail  of  the 
scenarios  in  a form  that  was  also  quantifiable.  The  level  of 
j,  detail  was  commensurate  with  the  data  collected  from  various 

sources  throughout  NASA,  and  was  generally  at  a component  or 
subcomponent  level. 

Quantification,  in  contrast  to  model  development,  is  performed 
from  the  bottom  up.  Probability  distributions  that  reflect 
actuarial  information  about  the  APU,  analysis,  maintenance 
procedures,  and  engineering  judgment  were  developed  for  each 
component,  subcomponent,  and  event  in  the  model.  The  minimal 
cut  sets  of  the  split  fraction  models  were  obtained  and  the 
appropriate  probability  distribution  assigned  to  each  basic  event 
in  the  cut  sets.  The  RISKMAN  software  permitted  the  development 
of  algebraic  equations  representing  each  split  fraction  and  use 
of  the  assigned  probability  distributions  to  obtain  the  numerical 
value  of  each  split  fraction  in  the  APU  event  tree.  Another 
module  of  RISKMAN  combined  the  split  fractions  to  obtain  the 
frequency  of  each  scenario.  Since  each  scenario  was  associated 
with  a damage  state  (or  the  OK  state) , scenario  frequencies  were 
summed,  as  shown  in  Section  5,  to  obtain  the  total  damage  state 
frequency. 

The  results  of  this  study  are  presented  in  terms  of  probability 
distributions  and  the  risk  contributions  that  make  up  those 
distributions.  The  probability  distributions  are  discussed  in 
terms  of  the  following: 

a.  Prelaunch  and  ascent  (Stage  A)  risk  profiles  for  each 
damage  state  and  the  interpretation  of  the  profiles 

b.  Entire  flight  (Stages  A and  B combined)  risk  profiles  for 
each  damage  state  and  the  interpretation  of  the  profiles 


I 
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The  risk  contributors  are  discussed  in  terms  of  the  following: 

a.  Description  of  failure  scenarios  in  order  of  their  importance 
to  the  risk  profiles 

b.  Description  of  APU  component  failure  modes  in  order  of  their 
importance  to  the  risk  profiles 


8 . 1 RISK  PROFILES 


8.1.1  Risk  Profiles  for  Stage  A:  Prelaunch  and  Ascent 

The  probability  distributions  shown  in  Figure  8-1  represent  the 
study  conclusions  about  (1)  the  frequency  with  which  APU  failures 
would  result  in  loss  of  crew  or  vehicle,  (2)  the  frequency  of 
flights  in  which  APU  failures  would  result  in  launch  scrub,  and 
(3)  the  frequency  with  which  APU  failures  would  result  in  intact 
aborts  during  ascent.  The  time  employed  in  the  determination  of 
(1)  and  (3)  above  includes  the  period  from  lift-off  to  APU  shut- 
down after  the  OMS-1  burn.  The  time  employed  in  the  determination 
of  (2)  includes  the  period  from  prelaunch  APU  start  to  lift-off. 

The  frequency  of  launch  scrubs  includes  only  failures  which 
prevent  APU  start  and  those  which  prevent  continued  APU  operation. 
Those  failure  modes  which  represent  only  a violation  of  launch 
commit  criteria  were  omitted  due  to  limited  time  and  resources,  in 
order  to  simplify  the  model.  The  failure  modes  and  failure 
scenarios  contributing  to  these  risk  curves  are  presented  in  Tables 
8-1  through  8-8,  and  are  discussed  later.  Figure  8-2  shows  all 
three  probability  distributions  plotted  on  a common  scale. 

A great  deal  of  information  is  contained  in  these  distributions 
even  without  looking  further  into  what  scenarios  contribute  most 
to  them.  The  results  show  that  the  range  of  possible  frequencies 
of  LOC/V  during  ascent  lies  between  1 in  250  and  1 in  100,000 
flights.  That  is,  one  should  not  expect  an  LOC/V  before  250 
flights,  but  LOC/V  is  almost  certain  within  100,000  flights.  We 
are  90%  confident  that  the  frequency  with  which  APUs  would  cause 
loss  of  crew  or  vehicle  during  ascent  lies  between  1 in  about 
42,000  flights  (5th  percentile)  and  1 in  about  1,400  flights  (95th 
percentile) . The  median  frequency  of  occurrence  is  1 in  about 
9600  flights  (50th  percentile),  and  the  average  frequency  of 
occurrence  is  l in  about  3800  flights  (mean) . 
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APU  CONTRIBUTIONS  TO  FLIGHT  RISK 
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PROOFOF ■ CONCEPT  S TUOY  RISUl  TS  - 
NOT  APPROVED  FOR  DESIGN  EVALUATION 
OR  FLIGHT  CERTIFICATION 


From  this  study,  one  may  conclude  that  very  little  risk  is  assoc- 
iated with  an  APU  causing  LOC/V  during  ascent  during  the  life  of 
the  Shuttle  program.  Similarly,  the  results  show  that  the  range 
of  possible  frequencies  of  APU-caused  launch  scrubs  lies  between 
1 in  about  13  flights  and  1 in  about  90  flights.  We  are  90% 
confident  that  the  fraction  of  flights  in  which  APUs  would  cause 
a launch  scrub  lies  between  1 in  about  68  flights  (5th  percentile) 
and  1 in  about  20  flights  (95th  percentile) . 

The  average  frequency  with  which  APUs  would  cause  a launch  scrub 
was  estimated  to  be  1 in  about  32  flights.  This  result  is  quite 
consistent  with  the  observed  data  of  one  APU-related  launch  scrub 
in  the  first  25  flights.  The  average  frequency  with  which  a PLS 
would  be  declared  because  of  APU  malfunctions  was  estimated  to  be 
1 in  about  120  flights.  No  probability  distribution  is  provided 
for  the  PLS  case;  however,  the  PLS  risk  contributors  are  presented 
in  Tables  8-1  and  8-6. 

The  study  results  show  that  the  range  of  possible  frequencies  of 
APU-caused  intact  aborts  is  between  1 in  about  2000  ascents  and  1 
in  about  10,000  ascents.  That  is,  one  should  not  expect  an  APU- 
caused  intact  abort  before  2000  flights,  but  an  intact  abort  is 
almost  certain  within  10,000  flights.  We  are  90%  confident  that 
the  frequency  with  which  APUs  would  cause  an  intact  abort  during 
ascent  lies  between  1 in  about  6,600  flights  (5th  percentile)  and 
1 in  about  1,700  flights  (95th  percentile).  The  median  frequency 
of  occurrence  is  once  in  about  3,600  flights  (50th  percentile), 
and  the  mean  occurrence  is  once  in  about  3,000  flights.  From 
this  study,  one  may  conclude  that  there  is  little  risk  associated 
with  an  APU  causing  an  intact  abort  during  the  life  of  the 
Shuttle  program. 


8. 1.1.1  The  Effects  of  APU  Redundancy 

The  occurrence  of  a declared  PLS  associated  with  APUs  is  rela- 
tively more  likely  than  the  other  in-flight  damage  states.  This 
is  because  any  permanent  failure,  any  detected  leak,  or  any  other 
malfunctions  associated  with  declaring  an  APU  lost  after  lift-off 
would  lead  to  a PLS  in  accordance  with  the  Flight  Rules.  Since 
one  of  these  malfunctions  on  any  one  of  the  three  APUs  can  cause 
a PLS,  the  presence  of  three  APUs,  in  effect,  increases  the 
potential  for  a PLS. 
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The  risk  model  leads  to  an  intact  abort  if  any  one  of  the  three 
APUs  fail  during  the  thrust  bucket.  This  damage  state  also 
derives  no  benefit  from  the  fact  that  there  are  three  APUs. 

The  occurrence  of  a launch  scrub  is  also  relatively  likely 
because  any  failure  to  start  an  APU,  any  failure  of  an  APU 
to  continue  running  during  the  5 minutes  before  lift-off,  or 
virtually  any  other  detected  mal function  in  an  APU  during  this 
time  period  would  lead  to  a launch  scrub.  Again,  redundancy 
is  not  a benefit  for  this  damage  state. 

The  frequency  of  the  intact  abort  and  PLS  damage  states  is  aggra- 
vated by  the  predominately  series  nature  of  each  APU.  Only  the 
isolation  valves  and  control  valves  exhibit  some  redundancy  to 
prevent  an  APU  from  failing  during  operation.  In  series  systems 
every  component  must  succeed  for  the  system  to  succeed.  The 
failure  frequencies  of  components  in  series  are  summed  to  obtain 
the  system  failure  frequency.  For  the  above  damage  states,  all 
of  the  components  (or  redundant  component  pairs)  of  all  three 
APUs  are  summed. 

Redundancy  does,  however,  benefit  the  frequency  of  loss  of  crew 
or  vehicle.  All  scenarios  leading  to  this  damage  state  involve 
failure  of  at  least  two  APUs  or  one  APU  and  flight  critical 
eguipment  during  ascent.  Unfortunately,  the  effects  of  cascading 
damage  from  failure  modes  such  as  turbine  fragmentation  and  the 
occurrences  of  common  cause  failures  limit  the  benefits  of  this 
redundancy.  Cascading  damage  effects  are  aggravated  by  the  close 
proximity  of  APUs  1 and  2 and  the  close  proximity  of  these  APUs 
to  flight  critical  equipment  such  as  the  liquid  propellant  lines 
of  the  main  engines.  Without  the  common  cause  and  spatial  inter- 
action effects  the  fraction  of  flights  leading  to  loss  of  crew  or 
vehicle  would  be  about  one-half  the  current  assessed  value. 


8.1.2  Mission  Risk  Profile 

The  probability  distribution  shown  in  Figure  8-3  represents  the 
study  conclusions  about  the  frequency  with  which  APU  failures 
would  result  in  loss  of  crew  or  vehicle  from  lift-off  to  APU  shut- 
down after  wheelstop  (whole  flight) . These  data  were  derived  from 
the  Stage  A (ascent)  and  Stage  B (orbit  and  entry)  risk  models 
combined . . 
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Figure  8-3.  Probability  Distribution  for  LOC/V  - Entire  Mission 
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The  results  show  that  the  range  of  possible  APU-caused  LOC/V 
frequencies  lies  between  1 in  about  15  flights,  and  1 in  about 
300  flights.  That  is,  one  should  not  expect  an  LOC/V  before  15 
flights,  and  an  LOC/V  is  almost  certain  within  300  flights.  We 
are  90%  confident  that  the  frequency  with  which  APUs  would  cause 
loss  of  crew  or  vehicle  lies  between  1 in  about  215  flights  (5th 
percentile)  and  one  in  about  35  flights  (95th  percentile) . Inter- 
pretation of  the  risk  profile  (Figure  8-3)  suggesrs  that,  based 
on  this  study,  there  is  a substantial  risk  of  an  APU-caused  LOC/V 
during  the  life  of  the  Shuttle  program. 

The  average  frequency  with  which  APU-initiated  failures  would 
cause  a loss  of  crew  or  vehicle  is  l in  about  70  flights. 
Comparison  of  this  number  with  the  one  in  about  3,800  for 
ascent  clearly  shows  that  the  risk  of  the  APUs  to  the  vehicle 
occurs  predominately  after  ascent.  In  fact,  ascent  accounts 
for  only  1.3%  of  the  total  risk  to  the  vehicle  during  a flight. 

The  breakdown  of  the  Stage  B risk  into  scenarios,  as  will  be 
discussed  in  succeeding  sections,  tells  us,  further,  that  the 
APU  risk  is  predominately  associated  with  entry. 

The  1 in  70  flights  is  far  more  frequent  than  would  be  expected 
from  2 APUs  failing  independently  during  the  flight.  If  indepen- 
dent failures  were  the  only  contributors  to  loss  of  vehicle,  then 
the  frequency  would  be  about  10  times  lower.  This  indicates  that 
cascading  effects  across  subsystem  boundaries  and  common  cause 
failures  play  a very  significant  role  in  the  risk  profile. 

The  results  show  that  small  fuel  leaks  from  the  APUs  into  the  aft 
compartment  initiate  cascading  effects  that  include  fires  and 
hydrazine  damage  to  wiring  insulation.  The  consequential  damage 
to  APUs  or  other  flight  critical  equipment  in  the  aft  compartment 
from  these  hydrazine  effects  is  the  most  important  cause  of  loss 
of  crew  or  vehicle.  Leakage  of  fuel  is  one  of  the  more  frequent 
malfunctions  of  the  APUs.  The  database  indicated  that  leakage 
into  the  aft  compartment  could  occur  once  in  about  3,700  hours  of 
flight  time  for  each  APU.  The  APUs  can  develop  a fuel  leak  any 
time  during  the  flight.  A typical  flight  exposes  each  APU  to 
hydrazine  for  approximately  154  hours.  Therefore,  APU  leakage 
into  the  aft  compartment  can  be  expected  on  the  average  approxi- 
mately once  in  every  eight  or  nine  flights.  The  information 
presented  in  Sections  6.6  and  7.6  indicated  that  damage  to  two 
APUs  or  flight  critical  equipment,  given  that  hydrazine  leakage 
into  the  aft  compartment  has  occurred,  is  very  likely  during 
entry.  The  likelihood  of  loss  of  crew  or  vehicle  because  of  a 
leak  occurring  on  orbit  or  during  entry  is,  therefore,  also  high. 
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The  reasons  that  ascent  represents  a small  portion  of  the  risk 
to  the  vehicle  are  related  to  the  likelihood  of  leakage  and 
consequential  da.mage  from  leakage.  First,  ascent  represents 
only  about  0.1%  of  the  total  exposure  time  for  leaks  to  develop. 
Second,  the  aft  compartment  is  purged  with  nitrogen  to  prevent 
fires  from  occurring.  Third,  the  concentration  of  oxygen  in  the 
atmosphere  quickly  becomes  too  low  to  support  combustion  as  the 
vehicle  ascends.  And  fourth,  the  acceleration  vector  during 
ascent  tends  to  force  hydrazine  to  migrate  away  from  ignition 
sources  and  critical  APU  equipment.  APU  equipment  malfunctions 
also  provide  less  of  a contribution  to  risk  during  ascent  because 
of  the  shorter  run  time.  Furthermore,  failures  of  an  APU  to 
start  during  stage  A contribute  to  launch  scrub  and  not  to  loss 
of  crew  or  vehicle.  During  entry,  on  the  other  hand,  failures 
to  start  do  contribute  to  loss  of  crew  and  vehicle. 


8.2  DESCRIPTION  OF  RISK  CONTRIBUTORS 


8.2.1  Failure  Scenario  Importance  Ranking 

Risk  can  be  identified  by  two  different  means,  each  with  its  own 
advantages.  The  first  method  is  a ranking  of  failure  scenarios 
that  contribute  to  each  risk  category?  i.e.,  LOC/V,  Loss  of 
Mission,  Launch  Scrub,  etc.  This  method  clarifies  the  sequences 
of  related  or  unrelated  events  that  can  lead  to  each  damage  state 

of  interest. 

• 

The  second  method  is  a ranking  of  individual  component  failures  or 
groups  of  components  that  contribute  risk  in  one  or  more  of  the 
failure  scenarios.  This  second  method  focuses  on  the  individual 
component  failures  that  contribute  most  greatly  to  the  failure 
scenarios.  These  are  the  component  failures  which,  if  eliminated 
or  reduced  in  frequency,  can  significantly  lower  the  overall  risk 
to  the  vehicle  associated  with  the  APU. 


8.2.1. 1 Loss  of  Crew  or  Vehicle  During  Ascent 

Three  scenarios  provide  98%  of  the  risk  of  loss  of  crew  and 
vehicle  due  to  APU  failures  during  ascent  (bear  in  mind  that 
ascent  represents  only  1.3%  of  the  overall  APU  risk).  These  are 
ranked  and  summarized  in  Table  8-1A.  The  most  risk-significant 
scenario  (71%  of  the  frequency  of  loss  of  crew  and  vehicle) 
involves  failure  of  an  APU  turbine  such  that  the  turbine  breaks 
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into  high  energy  fragments  while  it  is  operating  at  normal  speed 
between  lift-off  and  MECO.  Breakup  can  occur  either  from  a flaw 
which  could  contribute  to  accelerated  crack  propagation,  from 
fatigue,  or  from  other  causes.  Inspection  of  HPU  turbines  after 
flight  has  consistently  shown  cracks  in  turbine  blades,  and 
several  incidents  of  turbine  blade  loss  have  occurred  during  APU 
and  HPU  testing.  To  date,  however,  no  turbine  wheel  hub  has  come 
apart  at  normal  speed. 

Turbine  breakup,  of  course,  guarantees  the  failure  of  at  least 
one  APU.  The  turbine  may  fail  in  a way  that  causes  it  to  wobble 
on  its  axis  of  rotation  such  that  when  it  comes  apart,  the  pieces 
are  thrown  out  of  the  normal  plane  of  rotarion  and  miss  the 
containment  ring.  Tests  have  demonstrated  that  the  portion  of  the 
turbine  housing  that  is  not  reinforced  by  the  containment  ring 
cannot  retain  the  fragments.  These  fragments  could  become  high 
energy  projectiles  capable  of  damaging  other  equipment  in  the  aft 
compartment,  including  other  APUs. 

Potential  targets  within  the  projected  path  of  the  shrapnel  were 
determined,  and  the  strength  of  the  materials  that  could  be 
struck  was  analyzed.  Among  the  likely  targets  of  the  shrapnel 
(as  discussed  in  Section  6.6)  are  the  liquid  oxygen  and  liquid 
hydrogen  propellant  lines  that  pass  from  the  external  tank 
through  the  aft  compartment  to  the  main  engines.  It  appears  that 
shrapnel  could  be  energetic  enough  to  pierce  both  the  outer  and 
inner  shells  of  these  lines.  Over-pressurization  of  the  aft 
compartment,  as  well  as  fire  and  explosion,  are  likely  outcomes 
of  this  event.  Shrapnel  could  also  hit  and  damage  hydraulic 
lines,  electrical  wiring,  and  other  APUs.  The  APU  fuel  tanks  are 
within  the  spray  pattern  of  shrapnel  from  APU  1 or  APU  2.  There 
is  also  some  chance  that  a substantial  hydrazine  leak  from  a fuel 
tank  could  strip  the  Kapton  insulation  from  electrical  wiring  in 
the  aft  compartment,  thus  failing  other  APUs  or  flight  critical 
hardware  despite  the  fact  that  the  nitrogen  purge  of  the  aft 
compartment  prevents  the  hydrazine  from  igniting. 

The  next  most  significant  scenario  to  loss  of  crew  or  vehicle 
during  ascent  accounts  for  23%  of  the  risk.  It  involves 
independent  APU  failures  such  that  two  APUs  cease  to  operate 
after  lift-off.  Even  if  the  APUs  should  fail  after  MECO,  it  is 
assumed  that  entry  and  landing  cannot  be  successful  on  only  one 
APU.  While  the  split  fraction  models  described  in  Section  6.5 
present  numerous  potential  equipment  failure  combinations  that 
cumulatively  provide  the  total  frequency  of  failures  of  two  APUs, 
one  of  these  combinations  has  been  assessed  as  contributing  about 
70%  of  the  frequency  of  this  scenario.  This  combination  entails 
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common  cause  restriction  of  lube  oil  circulation  in  two  APUs 
during  ascent.  Inadequate  lube  oil  circulation  causes  a rapid 
overheat  and  failure  of  the  bearings  on  the  rotating  equipment 
in  the  gearbox.  The  restriction  may  be  caused  by  hydrazine  leak- 
age from  the  fuel  pump  seal  through  the  drain  cavity  and  into  the 
gearbox  via  the  gearbox  shaft  seal  which  shares  the  same  seal 
drain  cavity,  or  it  may  be  caused  by  foreign  substances  introduced 
into  the  gearboxes  during  ground  servicing.  The  APU  flight 
history  database  (see  Section  7)  exhibits  several  occurrences  of 
high  lube  oil  pressure  and  partial  blockage  of  the  lube  oil 
filter.  Two  such  occurrences  were  on  the  same  flight,  and 
resulted  in  a launch  scrub.  The  database  also  reveals  several 
incidents  of  contamination  of  the  lube  oil  by  H2O,  including 
contamination  of  all  three  APUs  on  the  same  flight. 

Hydrazine  reacts  with  the  lube  oil  to  form  a waxy  substance  that 
collects  on  the  lube  oil  filters  and  eventually  blocks  them. 

Among  the  identified  commonality  of  causes  that  covered  two 
APUs  were  choice  of  incompatible  materials  (lube  oil  and 
hydrazine) , design  and  fabrication  of  the  seals  and  seal  drain 
system  that  allowed  the  two  materials  to  intermingle,  and 
failure  to  adequately  inspect  and  clean  the  filters  between 
missions.  Although  a flushing  and  inspection  procedure  has 
been  added  to  lube  oil  system  refurbishment,  the  other  two 
causes  remain  for  the  baseline  APU.  This  problem  should  be 
eliminated  by  the  Improved  APU  seal  cavity  design. 

The  third  risk-significant  scenario  accounts  for  4%  of  the 
frequency  of  loss  of  crew  and  vehicle.  This  scenario  involves 
failure  of  both  the  primary  and  secondary  fuel  control  valves  in 
the  open  position  coupled  with  failure  of  the  overspeed  shutdown 
function  to  close  the  secondary  valve.  If  both  the  primary  and 
secondary  valves  fail  open  for  more  than  about  200  milliseconds 
beyond  the  normal  pulse  period,  the  turbine  speed  could  increase 
to  a speed  (about  108,000  rpm)  at  which  the  turbine  hub  would 
come  apart.  The  containment  ring  could  not  withstand  the  energy 
of  the  fragments  and  the  APU  housing  would  be  pierced,  sending 
high  energy  shrapnel  through  the  aft  compartment.  Even  if  the 
fuel  isolation  valves  are  closed  by  the  overspeed  shutdown 
circuit,  sufficient  hydrazine  remains  in  the  lines  downstream 
of  the  isolation  valves  to  power  the  turbine  to  overspeed.  The 
remainder  of  the  scenario  is  as  described  above  for  turbine 
rupture  at  normal  speed. 
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8. 2. 1.2  Launch  Scrub 


Table  8 -IB  shows  that  about  97%  of  the  frequency  of  launch  scrub 
is  provided  by  two  scenarios.  The  most  important  scenario 
contributes  87%  of  the  launch  scrub  frequency.  This  scenario 
represents  those  APU  failures  that  occur  upon  attempting  to  start 
the  APUs  at  5 minutes  before  lift-off.  A ranking  of  the  most 
important  start  failures  and  their  percent  contribution  is 
presented  in  Table  8 -IB. 

The  other  scenario  accounts  for  10%  of  the  launch  scrub  frequency. 
This  scenario  involves  failures  of  equipment  in  a single  APU 
during  the  5 minutes  of  run  time  before  lift-off.  These  are 
failures  that  would  cause  the  APU  to  cease  operating.  As  noted 
earlier,  violations  of  launch  commit  criteria  that  allow  the  APUs 
to  continue  operating  were  not  included  in  the  scope  of  this 
study.  It  was  assumed  that  the  launch  control  center  could 
detect  an  APU  failure  and  scrub  the  launch  essentially  right  up 
to  the  launch  command.  A ranking  of  the  most  important 
individual  APU  run  failures  and  their  percent  contribution  is 
presented  in  Table  8 -IB. 


8 . 2 . 1 . 3 Intact  Aborts  and  PLS 

Failure  of  individual  APUs  to  continue  operating  after  launch  are 
the  most  important  scenarios  for  these  damage  states.  Tables  8-1C 
and  8-ID  summarize  the  most  important  APU  run  failures  and  their 
percent  contributions  to  the  frequencies  of  these  damage  states. 


8. 2. 1.4  Loss  of  Crew  or  Vehicle  Over  the  Entire  Flight 

The  scenarios  most  important  to  risk  to  the  vehicle  from  lift-off 
through  APU  shutdown  after  wheelstop  are  summarized  in  Table  8-2. 
Six  scenarios  provide  61%  of  the  risk.  They  involve  leakage  of 
hydrazine  from  any  one  of  the  APUs  or  any  two  of  the  APUs  in 
combination.  Twenty  five  percent  of  the  risk  of  these  scenarios 
comes  from  combinations  of  two  APUs  leaking  fuel  during  the  same 
mission.  This  reflects  the  in-flight  experience  in  which  two  APUs 
leaked  fuel  during  the  same  mission  (STS-9) . The  study  assumed 
that  an  APU  could  develop  a leakage  during  orbit.  Since  the  APUs 
are  shut  down  during  orbit,  the  leak  may  not  reveal  itself  until 
after  the  APUs  have  started  for  entry.  Even  then  a small  but 
dangerous  leak  could  go  undetected;  this  study  assumed  that  such 
leaks  are  not  detected  (as  opposed  to  large  leaks,  which  would  be 
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detected) . Similarly,  a small  leak  could  develop  any  time  during 
entry.  The  study  assumed  that  leakages  or  leakage-induced 
failures  that  occur  after  wheelstop  would  not  cause  a loss  of 
crew  or  vehicle. 

The  damage  caused  by  leakage  of  hydrazine  stems  from  three  of  its 
physical  properties.  First,  hydrazine  is  corrosive  to  certain 
materials.  One  such  material  is  the  Kapton  wiring  insulation 
used  extensively  in  the  aft  compartment.  Second,  hydrazine  is 
flammable  in  as  little  as  about  4%  oxygen.  Hot  spots  on  the  AFUs 
themselves  can  provide  an  ignition  source  for  a hydrazine-oxygen 
mixture.  Third,  hydrazine  will  auto-decompose  to  nitrogen, 
hydrogen  and  ammonia  in  an  exothermic  reaction  when  it  comes  in 
contact  with  certain  materials  such  as  metal  oxides,  which  may  be 
present  in  the  aft  compartment.  Because  of  these  properties,  the 
experience  with  STS-9,  and  the  density  of  critical  equipment  in 
the  aft  compartment,  a high  conditional  probability  (given  that 
a fuel  leak  has  occurred)  was  assigned  (by  expert  opinion)  to 
the  event  that  a fuel  leak  in  an  APU  would  lead  to  loss  of  crew 
and  vehicle  during  entry.  These  conditional  probabilities  are 
discussed  in  Section  7.6;  they  range  between  about  0.2  and  0.6. 

The  study  recognized  that  fires  cannot  occur  until  the  vehicle 
is  sufficiently  low  in  the  atmosphere.  As  a result,  the  study 
assumed  that  neither  an  APU  nor  flight  critical  equipment  would 
fail  above  about  65,000  feet. 

Another  16%  of  the  risk  of  loss  of  crew  of  vehicle  comes  from  17 
scenarios  that  involve  hydrazine  leakage  either  preceded  by  or 
followed  by  an  independent  failure  of  an  APU.  Such  failures  could 
occur  upon  starting  the  APUs  for  entry,  while  the  APUs  are  running 
during  entry,  or  from  a hydrazine  leak  into  a solenoid  cavity  with* 
in  the  gas  generator  valve  module.  Fuel  leakage  into  a solenoid 
cavity  was  assumed  to  trigger  an  auto— decomposition  reaction  that 
causes  a rupture  of  the  valve  cover  and  a loss  of  that  APU. 

Start  failures,  run  failures,  and  heater  failures  of  two  APUs 
during  orbit  and  entry  comprise  about  9%  of  the  risk.  Table 
8-2  summarizes  and  ranks  the  individual  APU  failures  that  are 
important  contributors  to  these  scenarios. 

About  4%  of  the  risk  was  assessed  to  be  initiated  by  hydrazine 
leakage  into  the  solenoid  cavity  of  one  of  the  fuel  tank  isolation 
valves.  If  an  auto-decompostion  reaction  and  rupture  of  the  valve 
cover  followed  this  leakage,  then  the  contents  of  the  hydrazine 
tank  were  assumed  to  be  dumped  into  the  aft  compartment.  This 
event,  therefore,  was  assumed  to  lead  to  loss  of  crew  and  vehicle. 
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The  risk  associated  with  turbine  shrapnel  from  both  normal  speed 
and  overspeed  conditions  comprises  about  4%  of  the  total  risk. 
Occurrence  of  this  incident  during  entry  accounts  for  about  four- 
fifths  of  this  4%.  Only  about  one-twentieth  of  the  risk  from 
shrapnel  is  attributable  to  turbine  overspeed,  as  opposed  to 
turbine  breakup  at  normal  speed. 

Only  about  1%  of  the  estimated  risk  to  the  vehicle  due  to  APU- 
initiated  scenarios  is  associated  with  a first-day  pls  entry 
condition.  These  scenarios  involve  an  APU  failure  during  ascent 
and  another  APU  failure  during  the  abbreviated  remainder  of  the 
mission  (about  5.7  hours). 

The  remaining  5%  of  the  risk  is  distributed  among  all  other 
scenarios  modeled  in  the  event  trees. 


8.2.2  Component  Failure  Importance  Ranking 

Another  way  to  dissect  the  results  is  to  perform  sensitivity 
studies  on  the  importance  of  individual  risk  contributors  to  the 
overall  frequency  of  each  damage  state.  This  was  done  by  numerous 
requantifications  of  the  APU  risk  model.  For  each  requantification, 
a specific  failure  was  assigned  a failure  frequency  of  zero.  In 
other  words,  the  component  was  assumed  to  be  perfect  with  respect 
to  that  failure  mode.  In  general,  the  requantification  yields 
an  estimate  of  the  damage  state  frequency  that  is  lower  than  the 
base  case.  The  following  importance  parameter  was,  therefore, 
used  to  rank  the  individual  failure  modes: 


BASELINE  QUANTIFICATION  - Jth  REQUANTIFICATION 


BASELINE  QUANTIFICATION 


The  results  shown  in  Tables  8-3  through  8-8  are  normalized  by  a 
factor  representing  the  summation  of  all  I j . The  failures  shown 
in  the  tables  represent  90%  or  more  of  their  respective  damage 
state  f requenc ies . 

Table  8-7  represents  the  results  of  the  first  iteration  of  the 
loss  of  crew/vehicle  for  an  entire  flight.  The  large  (74.6%) 
contribution  from  the  general  category  of  hydrazine  leaks  down- 
stream of  the  isolation  valves,  and  the  desire  to  rank  the  risk 
contributors  to  a finer  detail,  led  to  a second  iteration.  The 
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second  iteration  results  are  shown  in  Table  8—8,  and  identifies, 
more  specifically,  the  risk  points  of  leakage  downstream  of  the 
isolation  valves.  For  example,  71.6%  of  this  risk  can  be  attri- 
buted to  the  first  three  leak  sources.  Fuel  leakage  into  the  fuel 
isolation  valve  remains  high  on  the  risk  table.  This  second 
iteration  is  a result  of  modifying  the  model  to  quantify  individual 
fuel  leak  sources  and  to  eliminate  the  "MPU  fail  high"  failures, 
•yjiich  were  determined  to  be  non— credible.  This  recalculation  of 
the  results  was  facilitated  by  the  capability  of  the  PRA  process 
to  readily  support  iterations  of  the  results  for  a more  detailed 
examination  of  particular  risk  contributors  or  to  incorporate  new 
information. 

The  iteration  was  performed  by  setting  the  likelihoods  of  the  "MPU 
fail  high"  failures  to  zero  in  the  existing  model,  and  by  expanding 
the  fuel  leak  model  to  encompass  specific  leak  sources.  This 
expanded  fuel  leak  model  took  the  form  of  a fault  tree.  The  fuel 
leak  events  were  quantified  by  a Bayesian  update  based  on  a 
combination  of  similarity  data  and  Shuttle  flight  history  data. 

A second  iteration  was  not  performed  to  eliminate  the  "MPU  Fail 
High"  failure  modes,  shown  in  Tables  8-3  through  8-4,  due  to  time 
constraints.  However,  it  can  be  safely  assumed  that  these 
failures  would  drop  from  the  top  99%  category,  and  that  other 
failure  modes  would  move  up  accordingly . 


8.3  ASSESSMENT  OF  STUDY  RESULTS 

The  contributions  of  this  PRA  pilot  study  are  significant  because 

of  the  following  achievements. 

a.  The  study  was  able  to  develop  a multistage  model  that 
identified  and  ranked  scenarios  leading  from  AFU 
failures  to  loss  of  crew  or  vehicle  over  the  entire 
flight  from  lift-off  to  AFU  shutdown  after  wheelstop. 

b.  The  study  identified  and  ranked  scenarios  leading  from  APU 
failures  to  loss  of  crew  and  vehicle  during  ascent.  Ascent 
was  found  to  represent  only  about  1%  of  the  total  risk  of 
loss  of  crew  and  vehicle. 

c.  The  study  identified  and  ranked  scenarios  leading  from  APU 
failures  to  loss  of  crew  or  vehicle  during  orbit,  entry, 
and  landing.  The  risk  from  entry  and  landing  so  dominated 
the  risk  of  the  flight  that  the  overall  flight  risk  is 
essentially  equal  to  the  risk  from  entry  and  landing. 
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d.  The  study  identified  and  ranked  scenarios  leading  from  APU 
failures  to  other  damage  states,  namely,  launch  scrub, 
intact  abort  and  first-day  PLS  entry. 

e.  The  study  identified  and  ranked  the  individual  component 
failures  or  groups  of  failures  that  contributed  to  each 
damage  state.  The  results  show  that  the  bulk  of  the  risk 
for  each  damage  state  is  contributed  by  a relatively  small 
number  of  failures.  The  PRA  results  suggest  that  these 
failures  modes  should  receive  additional  attention  in  order 
to  achieve  significant  risk  reduction. 

f.  The  study  discovered  that  spatial  interactions,  failure 
effects  propagating  within  the  subsystem,  failure  effects 
cascading  into  other  subsystems,  and  common  cause  failures 
led  to  a risk  that  was  far  greater  than  from  independent 
APU  failures  alone. 

g.  The  study  found  that  the  proximity  of  the  APUs  to  each  other 
and  to  flight  critical  equipment  in  the  aft  compartment, 
coupled  with  the  APUs'  potential  for  releasing  hydrazine  and 
shrapnel,  and  the  requirement  for  two  of  three  APUs  for  safe 
flight,  constitute  the  bulk  of  the  risk  of  these  subsystems 
to  the  vehicle. 

The  risk  of  loss  of  crew  or  vehicle  from  the  APUs  is  clearly  domi- 
nated by  leakage  of  hydrazine  leading  to  propagating  and  cascading 
effects  of  fire,  hydrazine  corrosion,  hydrazine  decomposition  re- 
actions, and  possibly  detonation.  These  effects  were  assessed  to 
lead  to  failure  of  two  APUs  or  flight  critical  equipment  during 
entry  and  landing  with  a frequency  between  0.2  and  0.6,  given  that 
a leakage  has  occurred.  The  high  conditional  frequency  of  loss  of 
crew  and  vehicle  given  a hydrazine  leak  in  an  APU  resulted  from 
the  recognition  that  the  aft  compartment  is  crowded  with  equipment 
that  is  susceptible  to  the  effects  of  hydrazine  and  within  very 
close  proximity  to  the  source  of  hydrazine.  There  are  no 
effective  barriers  between  the  hydrazine  source  and  much  of  the 
equipment  in  the  aft  compartment.  Unfortunately,  the  APUs  them- 
selves provide  sufficient  heat  in  the  presence  of  oxygen  to 
ignite  leaking  hydrazine.  The  effects  of  hydrazine  ignition  were 
dramatically  demonstrated  at  the  end  of  the  flight  of  STS-9. 

It  should  be  noted  that  certain  changes  in  APU  design  and 
operations  have  been  implemented,  or  are  in  the  process  of  being 
implemented,  which  should  reduce  the  risk  associated  with  the 
APUs.  These  changes  include: 
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a.  Improved  fuel  leak  detection  procedures  during  APU  turnaround 
operations 

b.  Turbine  wheel  crack  mapping  program 

c.  Fuel  isolation  valve  redesign  to  eliminate  source  of  fuel 
leaks  into  solenoid  cavity 

d.  Chromized  gas  generator  injector  tubes  to  reduce  likelihood 
of  fuel  leaks 

e.  Redesign  of  fuel  pump/ gearbox  seal  cavity  to  eliminate  fuel/ 
lube  oil  mixing 

The  PRA  proof-cf-concept  study  model  was  built  and  quantified 
based  on  the  pre-STS-51L  configuration  except  where  otherwise 
noted,  and  does  not  reflect  any  of  the  changes  noted  above. 

A design  change  that  would  further  reduce  the  risk  posed  by  the 
APUs  would  be  to  erect  barriers  to  isolate  each  APU  from  the  rest 
of  the  aft  compartment.  Properly  designed,  these  barriers  would 
prevent  or  reduce  the  amount  of  hydrazine  entering  the  compartment 
due  to  leakage,  and  would  also  serve  to  reduce  the  detrimental 
effects  of  shrapnel  produced  by  turbine  breakup  during  flight. 

Another  approach  to  reducing  overall  risk  is  to  certify  that  the 
vehicle  is  capable  of  operating  throughout  the  flight  envelope 
(ascent  as  well  as  entry)  on  a single  APU.  A significant 
reduction  to  the  risk  of  LOC/V,  as  determined  from  this  study, 
would  result  since  the  study  was  heavily  influenced  by  the 
assumption  that  two  APUs  were  required  for  safe  flight. 
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TABLE  8-1A 


Page  1 of  8 


RAMS 


1 


2 


IMPORTANCE  RANKING  OF  APU  FAILURE  SCENARIOS 
LOC/Y  - ASCENT 


FAILURE  8CENARIO  RISK  CONTRIBUTORS 


Turbine  failure  leading  to  shrapnel  induced  failure 
of  second  APU  or  flight  critical  equipment  between 
launch  and  MECO 

Contributors  and  % Contribution  to  Scenario  l: 

a.  Turbine  fragmentation  at  normal  speed  (100%) 

Equipment  failure  of  2 APUs  between  launch  and  APU 
shutdown  after  MECO 

Contributors  and  % Contribution  to  Scenario  2 : 

a.  Lube  oil  circulation  restricted  in  two  APUs, 
causing  bearing  overheat  and  failure  of  rota- 
ting equipment  in  gearbox  from  both  common 
cause  and  independent  failures  (73%) 

b.  Lube  oil  circulation  restricted  in  one  APU  and 
Independent  failure  of  primary  fuel  control 
valve  (stays  closed  while  pulsing)  (6%) 

c.  Two  primary  fuel  valves  in  two  APUs  fail 
closed  while  pulsing  (5%) 

d.  Either  MPU  2 or  MPU  3 fails  high*  in  one  APU 
and  a primary  fuel  valve  fails  closed  in 
another  APU  (3%) 

e.  Restricted  lube  oil  circulation  in  one  APU  and 
either  MPU  2 or  MPU  3 fails  high*  in  another 
APU  (2%) 

f.  Turbine  wheel  failure  in  one  APU  and  primary 
fuel  valve  fails  closed  in  another  APU  (1%) 


Later  information  indicates  that  MPU  fail  high  may  not 
be  a credible  failure  mode 


% CONT- 
RIBUTION 


70.5 


23.4 
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TABLE  8-1A  ( Concluded ) 


Page  2 of  8 


FAILURE  SCENARIO  RISK  CONTRIBUTORS 


g.  Isolation  valve  switch  fails  to  open  upon  APU 
shutdown  in  one  APU  and  primary  fuel  valve 
fails  closed  during  pulsing  in  another  APU  (1%) 

h.  Turbine  wheel  fails  in  one  APU  and  lube  oil 
circulation  is  restricted  in  another  APU  (0.7%) 

i.  Isolation  valve  drivers  fail  on  upon  APU  shut- 
down in  one  APU  and  primary  fuel  valve  fails 
closed  while  pulsing  in  another  APU  (0.6%) 

j.  combinations  of  MPU,  primary  valve,  lube  oil 
circulation,  spurious  controller  actuation, 
turbine  wheel  failure,  gas  generator  failures, 
and  isolation  valve  drivers  and  switches  (6%) 


Turbine  overspeed  leading  to  fragmentation  of  the 

hub  and  shrapnel-induced  failure  of  a second  APU  or 

flight  critical  equipment 

Contributors  and  % Contribution  to  Scenario  3: 

a.  MPU  3 fails  low  and  secondary  fuel  valve  fails 
to  close  due  to  mechanical  failure  (44%) 

b.  Primary  fuel  valve  fails  open  during  pulsing 
and  secondary  fuel  valve  fails  to  close  on 
demand  due  to  mechanical  failure  (29%) 

c.  MPU  3 fails  low  and  secondary  fuel  valve  fails 
open  due  to  mechanical  failure  during  pulsing 
(13%) 

d.  Primary  and  secondary  fuel  valves  fail  open 
pulsing  due  to  mechanical  failures  (9%) 


All  Others 


Total 


% CONT- 
RIBUTION 


3.8 


2.3 


100.0 


Later  information  indicates  that  MPU  fail  high  may  not  be 
a credible  failure  mode 
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RANK 


1 


TABLE  8- IB 

IMPORTANCE  RANKINS  07  APU  FAILURE  8CENARI0S 
LAUNCH  SCRUB  - ASCENT 


% CONT- 

FAILURE  SCENARIO  RI8K  CONTRIBUTORS  RIBUTION 


Failure  to  start  an  APU  at  5 minutes  prior  to  lift-  87.4 

off 

Contributors  and  % Contribution  to  Scenario  l: 

a.  Secondary  fuel  control  valve  leaks  before 
isolation  valves  are  opened,  leading  to 
elevated  gas  generator  temperature  (29%) 

b.  Secondary  fuel  valve  fails  to  open  on  demand 
at  start  (11%) 

c.  Primary  fuel  valve  fails  to  close  on  demand 
at  start  (11%) 

d.  MPU  1 fails  low  on  demand  at  start  (8%) 

e.  Electric  power  to  secondary  fuel  valve  is  lost 
(7%) 

f.  MPU  1,  2,  or  3 fails  high*  on  demand  at  start 
(5%  each  » 15%) 

g.  Fuel  pump  bypass  valve  fails  to  open  on  demand 
at  start  (5%) 

h.  Fuel  pump  bypass  valve  fails  to  close  after 
normal  pump  pressure  is  reached  (5%) 

i.  Loss  of  electric  power  to  fuel  tank  isolation 
valve  at  start  (4%) 

j . Primary  or  secondary  fuel  valve  controller 
output  fails  off  on  demand  at  start  (4%) 


Later  information  indicates  that  MPU  fail  high  may  not  be 

Sk  i o i 1 

C tuwuc 
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TABLE  8— IB  (Concluded) 

FAILURE  SCENARIO  RISK  CONTRIBUTORS 

Failure  of  an  APU  to  continue  operating  after  start 
and  before  lift-off 

Contributors  and  % Contribution  to  scenario  2 : 

SEE  FAILURE  OF  AN  APU  TO  OPERATE  UNDER  INTACT  ABORT 
(TABLE  8-1C) 

Spurious  shutdown  of  any  one  APU  after  start  and 
before  lift-off 

Contributors  and  % Contribution  to  Scenario  3s 
SEE  SPURIOUS  SHUTDOWN  UNDER  PLS  (TABLE  8-ID) 

All  Others 

Total 


% CONT- 
RIBUTION 


9.5 


2.6 


0.5 


100.0 


Later  information  indicates  that  MPU  fail  high  may  not  be 
a credible  failure  mode 
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TABLE  8- 1C 

IMPORTANCE  RANKINS  OF  APU  FAILURE  SCENARIOS 


INTACT  ABORT  - ASCENT 


% CONT- 

RANK  FAILURE  SCENARIO  RISK  CONTRIBUTORS  RIBUTION 


1 Failure  of  an  APU  to  operate  while  in  the  thrust  76.6 

bucket 

Contributors  and  % Contribution  to  Scenario  1: 

a.  Primary  fuel  valve  fails  closed  during 
pulsing  (43%) 

b.  Lube  oil  circulation  restricted  (26%) 

c.  MPU  2 fails  high*,  causing  secondary  fuel 
valve  to  close  (6%) 

d.  MPU  3 fails  high*,  causing  primary  fuel 
valve  to  close  (6%) 

e.  Turbine  wheel  failure  (6%) 

f.  Fuel  pump  filter  blocked  (2%) 

g.  Gas  generator  fails  to  operate  (1%) 

h.  Lube  oil  pump  fails  to  run  (0.8%) 

i.  Fuel  pump  fails  to  run  (0.8%) 

j . Loss  of  electric  power  to  secondary  fuel 
valve  (0.5%) 

k.  Secondary  fuel  valve  controller  output 
failure  (0.3%) 


Later  information  indicates  that  MPU  fail  high  may  not  be 
a credible  failure  mode 
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TABLE  8— ic  (Concluded) 

rank  failure  scenario  risx  contributors 


2 Spurious  shutdown  of  any  one  APU  while  in  the 
thrust  bucket 

Contributors  and  % Contribution  to  Scenario  2 : 

SEE  SPURIOUS  SHUTDOWN  UNDER  PLS  (Table  8-ID) 

3 All  Others 

Total 


% CONT- 
RIBUTION 


20.7 


2.7 


100.0 
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RANK 


1 


TABLE  8-ID 

IMPORTANCE  RANKING  OF  APU  FAILURE  SCENARIOS 
APU  STAGE  A (ASCENT) 

PRIMARY  LANDING  SITE 


FAILURE  SCENARIO  RISK  CONTRIBUTORS 

Failure  of  an  APU  to  continue  operating  after 
lift-off,  except  in  the  thrust  bucket 

Contributors  and  % Contribution  to  Scenario  l: 


a. 

Primary  fuel  valve  fails  closed 
pulsing  (40%) 

during 

b. 

Lube  oil  circulation  restricted 

(24%) 

c. 

MPU  2 fails  high*,  causing  secondary 
valve  to  close  (6%) 

fuel 

d. 

MPU  3 fails  high*,  causing  secondary 
valve  to  close  (6%) 

fuel 

e. 

Turbine  wheel  failure  (6%) 

f. 

Isolation  valve  switch  fails  to 
APU  shutdown  (4%) 

open 

upon 

g- 

Isolation  valve  drivers  fail  on 

(3%) 

h. 

Fuel  pump  filter  blocked  (2%) 

i. 

Gas  generator  fails  to  operate 

(1%) 

j. 

Lube  oil  pump  fails  to  run  (0.8%) 

k. 

Fuel  pump  fails  to  run  (0.8%) 

Later  information  indicates  that  MPU  fail  high  may 
a credible  failure  mode 


% CONT- 
RIBUTION 


77.1 


not  be 
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RANK 


2 


3 


TABLE  8-ID  (Concluded) 


FAILURE  SCENARIO  RISK  CONTRIBUTORS 


Spurious  shutdown  of  any  one  AFU  after  lift-off 
except  in  the  thrust  bucket 

Contributors  and  % Contribution  to  Scenario  2: 

a.  MPU  1 fails  low,  causing  underspeed  trip 
(77%) 

b.  MPU  1 fails  high*,  causing  overspeed  trip 
(22%) 


All  Others 


Total 


Later  information  indicates  that  MPU  fail  high  may 
a credible  failure  mode 


% CONT- 
RIBUTION 


20.8 


2.1 


100.0 


not  be 
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TABLE  8-2 


Page  1 of  4 


IMPORTANCE  RANXXNG  OF  APU  FAILURE  8CENARI08 
LOC/V  - WHOLE  MISSION 


% CONT- 

RANK  FAILURE  SCENARIO  RISK  CONTRIBUTORS  RIBUTION 


1 Hydrazine  leak  downstream  of  fuel  isolation  39.1 

valves  and  into  aft  compartment  during  orbit  or 

entry  that  leads  to  failure  of  two  APUs  or 
flight  critical  equipment 

Contributors : 

a.  Leakage  from  any  one  APU  (100%) 

2 Hydrazine  leak  as  above,  but  from  two  or  three  26.5 

APUs  concurrently 

Contributors : 

a.  Leakage  from  combinations  of  two  APUs  (91%) 

b.  Leakage  from  three  APUs  (9%) 

3 Hydrazine  leak  from  a single  APU  as  above,  with  6.4 

an  independent  failure  of  another  APU 

Contributors : 

a.  Hydrazine  leak  in  one  APU,  with  equipment 
failure  of  another  APU  while  running  (see 
below  for  breakdown  into  APU  failure  modes) 

(88%) 

b.  Hydrazine  leak  in  one  APU,  with  start 
failure  of  another  APU  (see  below  for 
breakdown  into  APU  failure  modes)  (12%) 

4 Equipment  failure  of  two  APUs  during  orbit,  5.0 

entry,  or  landing  (failures  not  related  to  APU 

start) 

a.  Lube  oil  circulation  restricted  on  two  APUs 
(16%) 

b.  Primary  fuel  valve  fails  closed  while 
pulsing  on  one  APU  and  fuel  tank  GN2  quick 
disconnect  leaks  on  another  APU  (7%) 
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TABLE  8-2  (Continued) 

% CONT- 

FAXLURB  SCENARIO  RISE  CONTRIBUTORS  RIBUTION 


c.  Lube  oil  circulation  restricted  in  one  APU, 
and  primary  fuel  valve  fails  open  while 
pulsing  on  another  APU  (6%) 

d.  Primary  fuel  valve  fails  closed  while 
pulsing  in  two  APUs  (6%) 

e.  Primary  fuel  valve  fails  closed  while 
pulsing  on  one  APU,  and  fuel  tank  diaphragm 
leaks  on  another  APU  (4%) 

f.  Lube  oil  circulation  restricted  in  one  APU, 
and  fuel  tank  GN2  quick  disconnect  leaks  on 
another  APU  (4%) 

g.  Fuel  tank  diaphragm  leak  on  one  APU,  and 
fuel  tank  GN2  quick  disconnect  leaks  on 
another  APU  (3%) 

h.  Next  36  scenarios  have  combinations  of  lube 
oil  circulation  restricted,  tank  diaphragm 
leaks,  primary  fuel  valve  closure,  nitrogen 
leak  from  fuel  tank,  MPU  failures,  turbine 
failures,  and  loss  of  power  to  fuel  tank 
isolation  valves  (34%) 

Fail  to  start  one  APU  at  TIG-5  in  orbit  and 

equipment  failure  of  second  APU  while  running 

Contributors : 

IMPORTANT  APU  8TART  FAILURES: 

a.  Secondary  fuel  valve  fails  to  open  on 
demand  to  start  (18%) 

b.  MPU  1 fails  low  on  demand  to  start  (14%) 

c.  Electric  power  to  secondary  fuel  valve 
fails  at  start  (11%) 

d.  MPU  1 fails  high*  (9%) 


Later  information  indicates  that  MPU  fail  high  may 
not  be  a credible  failure  mode 


TABLE  8-2  (Continued) 
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„ % CONT- 

RANX  FAILURE  SCENARIO  RISK  CONTRIBUTORS  RIBUTION 

e.  MPU  2 fails  high*  (9%) 

f.  MPU  3 fails  high*  (9%) 

g.  Fuel  pump  bypass  valve  fails  closed  (9%) 

h.  Fuel  pump  bypass  valve  fails  open  (9%) 

i.  Electric  power  to  fuel  tank  isolation  valve 
fails  at  start  (7%) 

IMPORTANT  APU  EQUIPMENT  FAILURES: 

j.  Primary  fuel  valve  fails  closed  during 
pulsing  (19%) 

fc*  Fuel  tak  GN2  fill  quick  disconnect  fails 
open  (13%) 

l.  Heaters  fail  off  by  common  cause  (14%) 

m.  Lube  oil  circulation  restricted  (12%) 

n.  Fuel  tank  diaphragm  leaks  (8%) 

o.  Fuel  tank  nitrogen  leakage  (3%) 

p.  MPU  2 fails  high*  (3%) 

q.  MPU  3 fails  high*  (3%) 

r.  Turbine  wheel  failure  (3%) 

• Hydrazine  leaks  into  isolation  valve  solenoid,  3.8 

auto-decomposes , ruptures  valve  cover,  and 
contents  of  fuel  tank  are  dumped  into  aft 
compartment 

Contributors : 

a*  Leakage  into  solenoid  cavity  (100%) 

* Later  information  indicates  that  MPU  fail  high  may 
not  be  a credible  failure  mode 
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I 

i 


I 

RANK 


TABLE  8-2  (Concluded) 
FAILURE  8CENARIO  RI8X  CONTRIBUTORS 


Page  4 of  4 


% CONT- 
RIBUTION 


7 Turbine  comes  apart  at  normal  speed  during  3.1 

entry;  shrapnel  and  hydrazine  effects  fail 

a second  APU  or  flight  critical  equipment 

Contributors : 

a.  Turbine  wheel  comes  apart  and  escapes 
housing  (100%) 

8 Hydrazine  leak  from  two  APUs  as  above,  with  an  1.9 

independent  failure  of  another  APU 

Contributors : 

a.  Leakage  with  equipment  failure  of  APU  while 
running  (100%) 

9 Turbine  comes  apart  at  normal  speed  during  0.9 

ascent?  shrapnel  effects  fail  a second  APU  or 

flight  critical  equipment 

Contributors : 

a.  Turbine  wheel  comes  apart  and  escapes 
housing  (100%) 

10  Equipment  failure  of  one  APU  during  ascent  and  0.9 

another  during  orbit  or  entry 

Contributors : 

a.  Breakdown  of  APU  failures  provided 
previously 

11  All  Others  8.4 

TOTAL  100.0 
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TABLE  8-3 


IMPORTANCE  RANKING  OP  APU  FAILURES 
LOC/V  - ASCENT 


RANK 

COMPONENT/ AS  8 EMB  LY 
FAILURE  RISK  CONTRIBUTORS 

% CONT- 
RIBUTION 

1 

Turbine  Wheel  Failure 

45.09 

2 

Lube  Oil  Circulation  Restricted 

22.67 

3 

Primary  Fuel  Valve  Fails  Closed  During 
Pulsing 

20.48 

4 

MPU  2 Fails  High* 

2.50 

5 

MPU  3 Fails  High* 

2.50 

6 

Isolation  Valve  Switch  Fails  to  Open 
On  Demand 

1.65 

7 

Secondary  Fuel  Valve  Fails  to  Close  on 
First  Demand 

1.28 

8 

Isolation  Valve  A Drivers  Fail  On 

0.88 

9 

MPU  3 Fails  Low 

0.85 

10 

Fuel  Pump  Filter  Blocked 

0.39 

11 

Primary  Fuel  Valve  Fails  Open  During 
Pulsing 

0.36 

12 

Gas  Generator  Fails  to  Operate 

0.10 

13 

All  Other  Failures 

1.25 

Total 

100.00 

NOTE:  Proof -of -concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 

* Later  information  indicates  that  MPU  fail  high  may 
not  he  a credible  failure  mode 
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TABLE  8-4 


IMPORTANCE  RANKING  OF  APU  FAILURES 
LAUNCH  SCRUB 


COMPONENT/ ASSEMBLY 

RANK  FAILURE  RISK  CONTRIBUTORS 


% CONT- 
RIBUTION 


1 Secondary  Fuel  Valve  Leaks  Before  APU  Start  29.3 

2 Secondary  Fuel  Valve  Fails  to  Open  On  Demand  10.6 

3 Primary  Fuel  Valve  Fails  to  Close  on  the  10.6 

First  Demand 

4 MPU  1 Fails  Low  8.0 

5 Loss  of  Electrical  Power  to  Secondary  Fuel  6.6 

Valve 

6 MPU  1 Fails  High*  4.9 

7 MPU  2 Fails  High*  4.9 

8 MPU  3 Fails  High*  4.9 

9 Fuel  Pump  Bypass  Valve  Fails  to  Open  at  APU  4 . 8 

Start 

10  Fuel  Pump  Bypass  Valve  Fails  to  Close  After  4.8 

APU  Start 

11  Loss  of  Power  to  Fuel  Tank  Isolation  Valves  4 . 0 

12  Primary  Fuel  Valve  Fails  Closed  During  Pulsing  3 . 5 

13  Lube  Oil  Circulation  Restricted  1.9 

14  Secondary  Fuel  Valve  Controller  Output  Fails  1.1 

Off  On  Demand 

15  All  Other  Failures  0.1 

Total  100.0 


NOTE:  Proof -of -concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 

* Later  information  indicates  that  MPU  fail  high  may  not 
be  a credible  failure  mode 
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TABLE  8-5 


IMPORTANCE  RANKING  OF  APU  FAILURES 
INTACT  ABORT 


COMPONENT/ ASSEMBLY  % CONT- 

RANK  FAILURE  RISX  CONTRIBUTORS  RIBUTION 


1 

Primary  Fuel  Valve  Fails  Closed 
Pulsing 

During 

34.1 

2 

Lube  Oil  Circulation  Restricted 

19.5 

3 

MPU  1 Fails  Low 

17.0 

4 

MPU  1 Fails  High* 

5.0 

5 

MPU  2 Fails  High* 

5.0 

6 

MPU  3 Fails  High* 

5.0 

7 

Turbine  Wheel  Failure 

5.0 

8 

Fuel  Pump  Filter  Blocked 

2.0 

9 

Gas  Generator  Fails  to  Operate 

1.1 

10 

Lube  Oil  Pump  Fails  to  Run 

1.0 

11 

Fuel  Pump  Fails  to  Run 

1.0 

12 

Loss  of  Electrical  Power  to  Secondary  Fuel 
Valve 

0.5 

13 

Secondary  Fuel  Valve  Controller 
Fails  Off 

Output 

0.5 

14 

All  Other  Failures 

8.3 

Total 

100.0 

NOTE:  Proof -of-concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 

* Later  information  indicates  that  MPU  fail  high  may  not 
be  a credible  failure  mode 
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TABLE  8-« 


IMPORTANCE  RANKING  OP  APU  FAILURES 
PL8 


RANK 

COMPONENT/ASSEMBLY 
FAILURE  RI8X  CONTRIBUTORS 

% CONT- 
RIBUTION 

1 

Primary  Fuel  Valve  Fails  Closed  During 
Pulsing 

40.5 

2 

Lube  Oil  Circulation  Restricted 

24.0 

3 

MPU  2 Fails  High* 

5.8 

4 

Fuel  Tank  Isolation  Valve  Switch  Fails 
Open  On  Demand 

to 

4.3 

5 

Turbine  Wheel  Failure 

3.0 

6 

Fuel  Tank  Isolation  Valve  Drivers  Fail 

On 

2.8 

7 

Over/Under  Speed  Control  Circuit  Spuriously 
Closes  Secondary  Fuel  Valve 

2.4 

8 

MPU  1 Fails  Low 

2.1 

9 

Fuel  Pump  Filter  Blocked 

1.8 

10 

MPU  1 Fails  High* 

1.8 

11 

Gas  Generator  Fails  to  Operate 

1.3 

12 

All  Other  Failures 

10.2 

Total 

100.0 

NOTE:  Proof-of-concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 

* Later  information  indicates  that  MPU  fail  high  may  not 
be  a credible  failure  mode 


8-33 


TABLE  8-7 


Page  1 of  2 


IMPORTANCE  RANKING  OP  APU  FAILURES 
LOC/V  - WHOLE  FLIGHT  - 1st  ITERATION 


COMPONENT/ABSEMBLY  % CONT- 

RANK  FAILURE  RI6X  CONTRIBUTORS  RIBUTION 


1 Fuel  System  Leak  Into  Aft  Compartment  From  74.6 

Location  Downstream  of  Isolation  Valve 

2 Leak  Into  Fuel  Isolation  Valve  Solenoid  3 . 8 

Cavity 

3 Turbine  Wheel  Failure  3.8 

4 Leak  Into  Primary  Valve  Solenoid  Cavity  2.9 

(GGVM  Detonation) 

5 Primary  Valve  Fails  Closed  at  APU  Start  2.4 

6 Lube  Oil  Circulation  Restricted  2.3 

7 Fuel  Tank  GN2  Fill  Q.D.  Leakage  (Low  Fuel  1.8 

Tank  Pressure) 

8 Any  MPU  Fails  High  at  APU  Start*  1.3 

9 Fuel  Tank  Diaphragm  Leakage  1.2 

10  Secondary  Fuel  Valve  Fails  to  Open  at  APU  0.9 

Start 

11  Heater  Pair  116/117  Fails  Off  on  Orbit  0.8 

12  Any  MPU  Fails  High  While  APU  is  Running*  0.7 

13  MPU  1 Fails  Low  at  APU  Start  0.7 

14  Loss  of  Power  to  Secondary  Fuel  Valve  at  0.6 

APU  Start 


* Later  information  indicates  that  MPU  fail  high  may  not 
be  a credible  failure  mode 
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TABLE  8-7  (Concluded) 


COMPONENT/ASSEMBLY  % CONT- 


RANK 

FAILURE  RI8K  CONTRIBUTORS 

RIBUTION 

15 

Loss  of  Power  to  Fuel  Tank  Isolation  Valves 
at  APU  Start 

0.6 

16 

Fuel  Tank  GN2  Leakage 

0.5 

17 

Fuel  Pump  Bypass  Valve  Fails  to  Close  After 
APU  Start 

0.4 

18 

Heater  Pair  111/112  Fails  Off  On  Orbit 

0.3 

19 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  at  APU  Start 

0.1 

20 

Fuel  Isolation  Valve  Fails  to  Close  at  APU 
Shutdown  (GGVM  Large  Leak) 

0.08 

21 

Fuel  Isolation  Valve  Leaks  at  Closure  After 
Ascent 

0.08 

22 

Loss  of  Power  to  Secondary  Fuel  Valve  While 
APU  is  Running 

'0.02 

23 

Primary  Fuel  Valve  Controller  Output  Fails 
On  While  APU  Running 

0.01 

24 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  While  APU  Running 

0.01 

25 

All  Other  Failures 

0.10 

Total 

100.00 

NOTE:  Proof-of-concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 


8-35 


TABLE  8—8 


Page  1 of  2 


IMPORTANCE  RANKING  OF  APU  FAILURES 
LOC/V  - WHOLE  PLIGHT  - 2nd  ITERATION 


RANK 

COMPONENT/ASSEMBLY 
FAILURE  RISK  CONTRIBUTORS 

% CONT- 
RIBUTION 

1 

Leakage  From  Gas  Generator  Injector  Tube 

35.5 

2 

Leakage  From  Fuel  Lines  and  Fittings 

23.3 

3 

Leakage  From  Fuel  Pump 

12.8 

4 

Leak  Into  Fuel  Isolation  Valve  Solenoid  Cavity 

4.0 

5 

Leak  Into  Primary  Valve  Solenoid  Cavity  (GGVM 
Detonation) 

3.3 

6 

Primary  Valve  Fails  Closed  While  Pulsing 

3.1 

7 

External  Leakage  From  GGVM 

3.0 

8 

Lube  Oil  Circulation  Restricted 

2.8 

9 

Fuel  Pump  Shaft  Seal  Detonation 

1.8 

10 

Fuel  Tank  GN2  Fill  Q.D.  Leakage  (Low  Fuel  Tank 
Pressure) 

1.7 

11 

Heater  Pair  111/112  Fails  Off  On  Orbit 

1.6 

12 

Heater  Pair  116/117  Fails  Off  On  Orbit 

1.4 

13 

Fuel  Tank  Diaphragm  Leakage 

1.1 

14 

Secondary  Fuel  Valve  Fails  To  Open  At  APU 
Start 

0.9 

15 

MFU  1 Fails  Low  At  APU  Start  Valves  At  APU 
Start 

0.7 

16 

Loss  Of  Power  To  Secondary  Fuel  Valve  At  APU 
Start 

0.5 

17 

Loss  of  Power  To  Fuel  Tank  Isolation  Valves 
At  APU  Start 

0.5 
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TABLE  8-8  (Concluded) 


RANK 

. COKPONENT/AS8EKBLY 
FAILURE  RISK  CONTRIBUTORS 

% CONT- 
RIBUTION 

18 

Turbine  Wheel  Failure 

0.4 

19 

Fuel  Tank  GN2  Leakage 

0.4 

20 

Fuel  Pump  Bypass  Valve  Fails  To  Close  After 
APU  Start 

0.3 

Subtotal 

99.1 

21 

Leakage  From  Fuel  Line  Flex  Hose 

0.30 

22 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  At  APU  Start 

0.09 

23 

Leakage  From  Fuel  High  Point  Bleed  Q.D. 

0.05 

24 

Leakage  From  Fuel  Test  Port  Q.D. 

0.04 

25 

Fuel  Isolation  Valve  Fails  To  Close  At  APU 
Shutdown 

0.04 

26 

Fuel  Isolation  Valve  Leaks  At  Closure  After 
Ascent 

0.04 

27 

Loss  of  Power  To  Secondary  Fuel  Valve  While 
APU  Is  Running 

0.04 

28 

Primary  Fuel  Valve  Controller  Output  Fails  On 
While  APU  Is  Running 

0.01 

29 

Secondary  Fuel  Valve  Controller  Output  Fails 
Off  While  AFU  Is  Running 

0.01 

30 

All  Other  Failures 

0.28 

Total  100.00 


NOTE:  Proof-of-concept  study  results.  Not  approved 

for  design  evaluation  or  flight  certification. 
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9.0  HPU  SCENARIO  PRESENTATION 


The  first  step  in  performing  a Probabilistic  Risk  Assessment 
(PRA)  is  the  task  of  damage  state  and  failure-sequence 
definition,  and  system  modeling.  This  task  begins  with  a 
definition  of  the  objectives  of  the  study  and  the  acquisition 
of  a substantial  amount  of  information  on  system  design  and 
operation.  It  progresses  through  the  generation  of  system 
models,  both  inductive  and  deductive,  to  the  identification  of 
the  failure-initiating  events,  component  failures,  procedural 
faults,  and  dependent-failure  mechanisms  that  could  cause  these 
failure  sequences  to  occur. 

In  the  subsections  below,  the  methodology  described  in  Section 
5.0  is  traced  step-by-step  through  an  analysis  of  the  HPU 
Subsystem.  The  results  of  this  analysis  provide  the  framework  or 
model,  which  can  then  be  evaluated  using  the  failure  frequency 
data  described  in  Section  10. 0. 

Section  9.1  details  the  damage  states  selected  for  the  analysis. 
Section  9.2  details  the  Master  Logic  Diagrams  (MLDs)  developed 
to  show  HPU-related  failure  combinations  which  can  lead  to  these 
damage  states. 

The  event  sequence  diagrams  are  presented  in  Section  9.3.  These 
diagrams  illustrate,  in  greater  detail,  how  different  damage 
states  can  result  as  a consequence  of  various  types  of  HPU  fail- 
ures. The  breakdown  of  HPU  failure  types  and  different  damage 
states  developed  in  the  event  sequence  diagrams  provide  the  frame- 
work for  development  of  the  event  trees,  presented  in  Section  9.4. 

The  event  trees  establish  the  decision  points  for  which  specific 
probabilities  must  be  determined  in  order  to  arrive  at  overall 
probabilities  for  the  ultimate  damage  states.  The  event  trees 
are  similar  to  flow  charts;  each  decision  point  must  be  answered 
by  a yes/ no  question.  Each  path  through  the  event  tree  results 
in  either  a damage  state  or  a state  of  no  damage,  based  on  system 
insight  gained  through  the  preceding  steps  of  the  analysis. 

Each  decision  point  in  the  event  tree  must  be  assigned  a prob- 
ability, called  a split  fraction.  Determination  of  each  split 
fraction  depends  on  a logical  combination  of  events,  which  is 
expressed  in  the  form  of  a fault  tree.  The  top  event  of  the  fault 
tree  is  the  event  for  which  the  split  fraction  is  to  be  determined. 
Development  of  these  fault  trees,  or  split  fraction  models,  is 
presented  in  Section  9.5 
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9.1  HPU  DAMAGE  STATES 


The  damage  states  represent  the  ultimate  undesirable  events  for 
this  PRA.  The  damage  states  selected  for  this  study  were  not 
peculiar  to  the  HPU  under  study,  but  were  of  a broad  category 
which  would  encompass  any  of  the  Space  Shuttle's  subsystems.  In 
addition,  the  damage  states  were  selected  to  be  consistent  with 
the  National  Aeronautics  and  Space  Administration  (NASA)  Failure 
Mode  and  Effects  Analysis  (FMEA)  as  defined  in  National  Space 
Transportation  System  Instructions  for  Preparation  of  Failure 
Modes  and  Effects  Analysis  and  Critical  Items  List  (CIL)  (NSTS- 
22206) . The  ultimate  damage  states  selected  are  Loss  of  Crew 
and/or  Vehicle  (LQC/V)  and  Loss  of  Mission. 

Loss  of  mission  implies  that  the  ability  to  perform  all  or  a sub- 
stantial portion  of  the  payload  activities  was  lost.  For  the  HPU 
assessment,  loss  of  mission  is  limited  to  "launch  scrub"  during 
the  pre-launch  phase,  which  causes  a launch  delay  representing  a 
missed  window  of  opportunity  for  at  least  one  payload.  The  loss 
of  mission  damage  state  does  not  apply  to  the  Ascent  phase  since 
no  HPU  failures  lead  to  an  intact  abort.  Loss  of  crew  and/or 
vehicle  is  self-explanatory  and  applies  to  both  the  Prelaunch 
and  Ascent  phases. 

Not  all  HPU  subsystem  failure  modes  lead  to  either  of  these  two 
ultimate  damage  states.  The  analysis  involves  establishing  which 
failure  sequences  do  lead  to  these  damage  states,  and  attaching 
probabilities  to  them. 

Once  the  ultimate  damage  states  for  the  phases  were  defined,  the 
next  step  in  the  study  was  to  develop  a set  of  Master  Logic 
Diagrams  (MLDs)  using  the  damage  states  as  the  Top  Events  from 
which  to  build  failure  scenarios. 


9.2  MASTER  LOGIC  DIAGRAM  (MLD)  DEVELOPMENT 

After  the  damage  states  have  been  established  for  each 
mission  phase,  the  next  step  in  the  analysis  is  to  determine  how 
Hydraulic  Power  Unit  (HPU)  system  failures  can  initiate  scenarios 
that  lead  to  these  damage  states. 


9.2.1  General  Development  Process 

The  damage  states  represent  the  top  events  for  the  mission  stages 
being  analyzed  (see  Appendix  C9.2-1).  A damage  state  is  the 
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outcome  of  a scenario.  A damage  state  usually  is  an  undesired 
event  selected  because  of  a need  to  understand  its  frequency  of 
occurrence.  The  second  level  of  each  diagram  was  developed  in 
the  form  of  broad  categories  depicting  functional  ways  that  might 
lead  to  the  top  event  or  damage  state.  Not  all  of  these  Level  II 
events  were  developed  further. 

Level  III  of  the  MLD  introduces  failure  modes  that  were  judged  to 
be  results  of  HPU  system  failures,  and  succeeding  levels  break 
them  down  into  more  specific  functional  paths  until  specific  HPU 
system  failure  modes  appear  at  levels  as  low  as  Level  VI.  This 
"top  down"  approach  aids  in  identifying  unanticipated  failure 
effects  involving  the  HPU. 

Many  paths  were  developed  that  dealt  with  physical  processes 
about  which  there  is  some  uncertainty.  These  physical  processes 
were  flagged  as  technical  issues  to  be  resolved  through  in-house 
analysis,  technical  references,  and  reliance  on  expert  opinion. 
These  issues  deal  with  failure  effects  from  a hydrazine  fuel 
fire,  detonation  of  hydrazine,  shrapnel  due  to  turbine  wheel 
rupture  and  the  effects  of  hot  gas  due  to  an  exhaust  duct  leak. 
The  detailed  resolution  of  these  issues  is  discussed  in  Sections 
9.6  and  10.5. 


9.2.2  MLD  Descriptions 

An  MLD  was  developed  for  each  damage  state.  The  MLDs  presented 
in  Appendix  C9.2-1  and  C9.2-2  are  discussed  individually  below. 

MLD  #1  - Loss  of  Mission,  Launch  Scrub 

This  MLD  documents  HPU  failures  that  can  result  in  a launch 
scrub  by  violating  the  Space  Shuttle  Launch  Commit  Criteria. 

Any  of  the  four  HPUs  can  shut  down,  resulting  in  hydraulic 
system  performance  degradation  and  Thrust  Vector  Control  (TVC) 
system  malfunction  which  would  result  in  an  automatic  launch 
scrub.  An  HPU  can  exhibit  a performance  degradation  due  to 
high  or  low  turbine  speed,  fuel  system,  low  pressure  or 
malfunction  or  loss  of  system  instrumentation  prior  to  launch. 
Violation  of  these  HPU  performance  redlines  would  cause  an 
automatic  launch  hold  and  launch  scrub. 

MLD  #2  - Loss  of  Crew  end  Vehicle 

MLD  #2  documents  HPU  failures  that  lead  to  loss  of  crew  and 
vehicle  during  the  prelaunch  and  ascent  phases.  The  effects 
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of  HPU  failures  were  determined  to  be  associated  with  two  of 
the  Level  2 major  cause  categories:  (1)  loss  of  control,  and 

(2)  loss  of  vehicle  structural  integrity.  Loss  of  vehicle 
structural  integrity  applies  to  both  the  prelaunch  and  ascent 
phases . 

Loss  of  crew/vehicle  scenarios  for  the  prelaunch  and  ascent 
phases  involve  loss  of  structural  integrity  due  to  high  energy 
detonations  in  the  SRB  aft  skirt  area.  High  energy  release 
caused  by  HPU  failures  must  consider  such  scenarios  as: 
detonation  of  hydrazine  caused  by  shrapnel  from  an  HPU  turbine 
coming  apart,  fire  from  a random-  or  shrapnel -induced  hydrazine 
leak,  an  exhaust  leak.  These  scenarios  are  spatial  in  nature 
and  their  effects  on  loss  of  crew/vehicle  are  discussed  in  detail 
in  Sections  9 . 6 and  9.2. 

Loss  of  vehicle  control  leads  to  loss  of  crew/vehicle  during 
the  ascent  phase.  The  HPU  failures  which  lead  to  loss  of 
control  are:  loss  of  2 HFUs  in  either  Solid  Rocket  Booster  (SRB) 
and  subsequent  loss  of  hydraulic  power  or  loss  of  flight  critical 
equipment  such  as  ATVC  wiring  or  control  electronics  due  to  HPU 
turbine  shrapnel,  hydrazine  fire  or  HPU  exhaust  leak. 


9.3  EVENT  SEQUENCE  DIAGRAM  FOR  HPU  INITIATED  SCENARIOS 

Event  Sequence  Diagrams  (ESD)  illustrate  sequences  of  events 
leading  from  initial  failure  categories,  defined  by  the  master 
logic  diagrams,  to  damage  states.  They  tell  how  an  initial  fail- 
ure (i.e.,  failure  mode)  causes  a damage  state  (i.e.,  effect). 
When  quantified  by  the  use  of  event  trees  and  fault  trees,  the 
scenarios  and  the  events  within  the  scenarios  can  be  ranked  with 
respect  to  their  importance  to  a deunage  state  such  a Loss  of 
Crew/Vehicle  (LOC/V) . 


9.3.1  Interpretation  of  the  ESD 

One  ESD  was  developed  to  represent  both  prelaunch  and  ascent 
stages  of  the  Hydraulic  Power  Unit  (HPU)  mission,  and  is 
presented  in  Figure  9.3-1.  The  model  includes  the  time  from  HPU 
start  at  Time  of  Ignition  L/0  -30  seconds  to  Solid  Rocket  Booster 
Separation  (SRB  SEP)  (about  2.1  minutes  after  launch). 

The  ESD  was  developed  solely  from  the  perspective  of  HPU  perfor- 
mance during  the  mission.  Interfacing  systems  were  out  of  scope, 
as  were  scenarios  that  couple  performance  margins  of  other 
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systems  with  the  HPU.  For  example,  coupling  the  scenarios  of 
Auxiliary  Power  Unit  (APU)  failures  with  HPU  failures  was  not 
attempted  in  this  study. 

• 

The  boxes  in  an  ESD  ask  questions  about  the  occurrence  (or  non- 
occurrence) of  a category  of  events.  For  example,  the  question 
in  Figure  9.3-1,  "Hydraulic  System  OK?",  may  be  viewed  as  asking 
a large  number  of  questions.  Each  question  would  refer  to  a 
component  in  the  hydraulic  system,  for  example,  "Is  the  pump 
OK?".  An  ESD  is  not  meant  to  illustrate  the  detailed  logic  that 
is  involved  in  determining  combinations  of  failure  modes  that 
lead  to  HPU  failure.  This  is  achieved  in  the  split  fraction 
models  described  in  Section  9.5.  An  ESD  illustrates  the  overall 
flow  of  events  that  lead  from  an  initial  HPU  failure  to  Shuttle 
damage  states  such  as  LOC/V  and  LOM. 


9. 3. 1.1  Interpretation  of  Initial  Failure  Categories 

The  questions  relating  to  the  initial  failure  categories  are 
found  in  the  boxes  across  the  top  of  the  ESD.  The  categories 
are  phrased  as  questions  such  that  a successful  event  (i.e.  no 
initial  failure)  receives  a "yes"  answer  to  the  question  and  a 
horizontal  line  is  then  followed  to  the  next  event.  For  example, 
the  initial  failure  categories  of  equipment  failure,  turbine 
overspeed,  fuel  leakage,  and  exhaust  gas  leak  are  represented 
in  Figure  9.3-1,  as  follows: 

1.  No  permanent  HPU  failures?  (equipment  failures) 

2.  Turbijie  speed  control  OK?  (turbine  overspeed) 

3.  Fuel  boundary  remains  intact?  (fuel  leak) 

4.  Exhaust  gas  boundary  remains  intact?  (exhaust  gas  leak) 

The  question  "No  hydraulic  system  failures?"  is  also  asked,  even 
though  the  hydraulic  system  is  out  of  the  scope  of  this  PRA,  to 
demonstrate  how  an  ESD  can  diagram  the  interdependencies  between 
subsystems  and  include  sequences  of  events  that  cross  subsystems. 

A line  pointing  downward  from  an  initial  failure  category  indi- 
cates that  an  initial  failure  has  occurred  (i.e.  a "no"  answer 
to  the  question)  . A sequence  of  boxes  and  lines  that  follow  the 
arrows  from  an  initial  failure  to  a damage  state  is  called  a 
scenario.  A success  of  the  HPU  occurs  when,  according  to  the 
principles  of  scenario  structuring  described  in  Section  5,  all 
the  answers  to  the  questions  across  the  top  of  the  ESD  are  yes. 
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FIGURE  9 3 1 MPU  EVENT  SEQUENCE  DIAGRAM  (ASCENT) 


HPU  EVENT  SEQUENCE  DIAGRAM 


I 
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FIGURE  9.3-1 


Since  the  bodies  across  the  top  represent  a complete  set  of 
initiating  failure  categories,  then  in  the  absence  of  initiating 
failures  the  HPU  must  have  operated  successfully.  Any  scenario 
that  has  a vertical  (down)  line  must,  therefore,  be  less  than 
completely  successful.  The  actual  damage  of  the  scenario  depends 
on  the  number  and  type  of  subsequent  failures  and  the  timing 
of  these  failures.  The  ESD  explicitly  shows  cascading  damage 
associated  with  spatial  interactions  as  well  as  functional 
dependencies  and  independent  failures . 


9. 3. 1.2  Diagramming  Dependencies  in  an  ESD 

An  example  of  a functional  dependency  is  shown  in  the  sequence 
initiated  by  a failure  of  the  hydraulic  system.  The  failure 
mode  is  one  that  causes  a hydraulic  pump  seizure.  This 
situation  could  potentially  be  caused  by  a sudden  large  rupture 
of  a hydraulic  fluid  line.  Should  a seizure  of  the  hydraulic 
pump  occur,  the  kinetic  energy  of  the  system  could  possibly 
cause  a rupture  of  the  HPU  turbine  rotor.  This  is  represented 
by  the  question  "HPU  turbine  remains  intact?"  in  Figure  9.3.1. 
Thus  the  HPU  turbine  functionally  depends  on  the  avoidance  of 
catastrophic  hydraulic  pump  seizure.  Of  course,  a more  obvious 
functional  dependency  is  that  the  hydraulic  system  pump 
operation  depends  on  HPU  operation. 

An  example  of  a scenario  that  includes  cascading  damage  is  shown 
if  the  HPU  turbine  is  not  intact.  A negative  answer  to  the 
question  "HPU  turbine  remains  intact?"  means  that  the  turbine 
rotor  has  come  apart  and  the  pieces  have  not  been  contained. 

In  that  situation,  the  HPU  has  failed  and  it  allows  hydrazine 
to  escape  into  the  aft  skirt.  Questions  about  cascading  damage 
concern  such  items  as  whether  there  is  sufficient  oxygen  in  the 
aft  skirt  to  support  combustion,  whether  other  conditions 
necessary  for  a fire  are  present,  whether  autodecomposition  of 
hydrazine  will  cause  further  damage,  and  whether  shrapnel  from 
the  rotor  itself  will  cause  further  damage  to  the  other  HPU  or 
"flight  critical  equipment"  in  the  aft  skirt.  The  term  flight 
critical  equipment  is  defined  for  this  study  to  be  any  component 
or  groups  of  components  that  are  not  part  of  the  HPU  and  whose 
failure  directly  causes  a LOC/V  in  conjunction  with  previous 
failures  in  the  scenario.  More  detailed  discussions  of  phenomena 
related  to  cascading  damage  are  provided  in  Section  9.6. 
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9. 3. 1.3  Modeling  Spatial  Interaction  Events  in  an  ESD 

Spatial  interaction  events  (SIE)  denote  potential  failures  of 
equipment  by  virtue  of  their  spatial  proximity  to  the  phenomena 
such  as  shrapnel  and  hydrazine  reactions  that  tend  to  lead  to 
cascading  damage.  The  spatial  interaction  phenomena  considered 
in  this  study  are  as  follows: 

1.  Hydrazine  reaction  with  materials  in  the  aft  skirt 
causing  deterioration  of  either  wire  insulation  or  other 
material  in  the  aft  skirt  following  hydrazine  leakage 

2.  Exothermic  hydrazine  decomposition  reaction  in  an  oxygen 
poor  environment  following  hydrazine  leakage 

3.  Fire  in  the  aft  skirt  caused  by  hydrazine  combustion 
following  hydrazine  leakage  is  assumed  to  be  of  negligibly 
small  likelihood  because  the  environment  in  the  aft  skirt 
is  made  inert  with  nitrogen.  Consideration  of  ground  crew 
failures,  in  particular,  failure  to  purge  the  aft  skirt  is 
beyond  the  scope  of  this  study. 

4.  Shrapnel  caused  by  turbine  rotor  failure  at  either  normal 
speed  or  turbine  runaway  conditions 

5.  Detonations  caused  by  compression  of  hydrazine  bubbles, 
leakage  into  solenoids  of  the  fuel  isolation  or  control 
valves,  or  hydrazine  decomposition  reaction  caused 
heating  of  hydrazine  in  the  fuel  tank  or  fuel  lines 

6.  Leakage  of  hot  gas  into  the  aft  skirt  caused  by  exhaust 
duct  failure 

The  ESD  also  recognizes  that  certain  failures  may  cascade  and 
cause  others.  For  example,  shrapnel  generated  by  turbine  rupture 
or  hydrazine  failures  detonation  could  cause  hydrazine  leakage 
into  the  aft  skirt  which,  in  turn,  could  result  in  a 
decomposition  reaction  which,  in  turn,  could  cause  another 
detonation,  etc.  A more  detailed  discussion  of  the  damage 
potential  of  these  events  is  found  in  Section  9.6. 

Below  the  SIE  in  Figure  9.3-1  is  a triangle  with  a Greek 
character  printed  within.  This  denotes  a transfer  to  another 
place  in  the  ESD  that  has  another  triangle  with  the  same 
character  within.  The  SIE  diagram  asks  questions  concerning 
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the  number  of  HPUs  that  have  failed  and  if  flight  critical 
equipment  has  failed  as  a result  of  the  phenomena  contributing 
to  spatial  interactions. 

The  ESD  asks  if  spatial  interactions  have  failed  flight  critical 
equipment.  Then  the  ESD  asks  if  two  HPUs  have  failed  as  a result 
of  the  initial  failure  and  the  spatial  interaction.  The  model 
assumes  a LOC/V  if  either  occurs. 

Finally  the  ESD  asks  if  two,  one  or  no  HPUs  have  failed  as  a 
result  of  the  initial  failure,  spatial  interaction,  and  potential 
independent  failure  of  another  HPU. 

We  have  so  far  given  examples  of  how  an  ESD  diagrams  functional 
dependencies,  cascading  damage,  and  spatial  interactions. 
Independent  failures  are  diagrammed  in  a similar  manner. 

Although  the  the  combination  of  two  or  more  failures  occurring 
independently  is  probably  of  lower  frequency  than  dependent 
failures,  the  ESD  recognizes  their  potential.  (The  PRA 
assesses  the  frequency  of  the  scenarios  by  the  use  of  event 
trees,  split  fraction  models,  and  data  later  in  the  study.) 

Suppose,  for  example,  that  an  HPU  fails  because  of  a problem  in 
the  gearbox,  but  the  remaining  HPU  is  ok  and  able  to  support 
both  hydraulic  actuators.  That  same  HPU  could  also  be  leaking 
hydrazine.  The  transfer  triangle  with  a "l"  within  leads  to  the 
next  question,  which  is  about  whether  the  hydrazine  fuel  boundary 
remains  intact.  A leakage  in  this  scenario  (that  is,  following 
a gearbox  failure  but  with  no  other  failures)  would  be  a second 
failure  of  the  same  HPU,  occurring  independently;  that  is,  not 
caused  by  or  related  to  the  gearbox  failure. 

All  scenarios  in  the  HPU  ESD  ask  if  either  hydrazine  leakage  or 
exhaust  gas  leakage  or  both  can  occur.  This  recognizes  that 
virtually  any  HPU  malfunction  or  failure  can  also  be  accompanied 
by  the  initial  failure  categories  of  hydrazine  and  exhaust  gas 
leakage. 

The  ESD  accounts  for  two  HPUs  in  an  SRB  and  diagrams  scenarios 
in  which  failures  can  occur  in  more  than  one  HPU  in  the  same 
mission.  The  shadow  boxes  of  the  initial  failure  categories 
across  the  top  of  Figure  9.3-1  illustrates  the  diagrammatic 
device  used  to  represent  this.  The  diagram  is  read  left  to 
right  for  each  HPU.  Scenarios  for  the  HPUs  of  the  two  SRBs  are 
considered  to  be  completely  identical  to  each  other  but  to 
occur  independently. 
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In  summary,  an  ESD  is  capable  of  exhibiting  scenarios  that 
include  failures,  malfunctions,  multiple  subsystems,  dependent 
events,  cascading  damage,  spatial  interactions,  human  actions, 
and  damage  states  for  each  stage  of  the  mission.  The  remainder 
of  Section  9.3  describes  the  events  found  in  the  HPU  ESD.  Since, 
as  discussed  above,  hydraulic  system  failures  are  included  for 
illustrative  purposes  only,  the  following  discussion  will  not 
include  hydraulic  system-initiated  scenarios. 


9.3.2  HPU  Scenarios  from  L/0-30  Seconds  to  SRB  SEP 

The  ESD  in  Figure  9.3-1  covers  the  mission  between  L/0-30 
seconds  when  the  HPU  starts  and  HPU  shutdown  at  the  time  of  SRB 
SEP,  about  2.1  minutes  after  launch. 


9. 3. 2.1  Scenarios  Initiated  by  Permanent  HPU  Failure  Category 

This  initiating  failure  category  includes  a number  of  failures 
of  HPU  equipment.  It  includes,  for  example,  failure  to  start 
the  HPU,  failures  of  the  pump,  valves,  turbine,  and  gearbox  to 
continue  running,  plugging  of  the  lube  oil  system  and  plugging 
of  the  fuel  line.  A complete  description  of  all  initiating 
failures  included  in  the  model  of  this  category,  is  found  in 
Section  9.5.2.  This  category  does  not  include  hydrazine 
leakages  to  the  aft  skirt  or  into  valve  solenoids.  It  does 
not  include  turbine  runaway  events. 

The  gearbox  and  the  turbine,  have  been  singled  out  for  additional 
attention  in  the  diagram  because  certain  failure  modes  of  these 
components  could  potentially  lead  to  spatial  interaction  events. 
The  following  describes  the  scenarios  in  Figure  10.3-1  that  are 
beneath  the  box  with  the  question  "No  permanent  failures?". 

The  next  event  beneath  this  category  asks  if  the  gearbox  is  OK. 
This  event  includes  all  failure  modes  of  the  gearbox.  A negative 
answer  to  this  question  could  mean  that  the  gearbox  has  failed 
in  a way,  that  could  cause  rapid  seizure  of  the  turbine  shaft. 

The  question  "HPU  turbine  remains  intact?"  is,  therefore,  asked. 

A negative  answer  means  that  the  gearbox  failure  may  (or  may 
not)  have  caused  an  energetic  failure  of  the  turbine  rotor  with 
subsequent  failure  to  contain  the  pieces  within  the  HPU.  If 
the  gearbox  is  OK,  then  the  ESD  asks  about  independent  turbine 
failure  at  normal  turbine  speed.  If  the  HPU  turbine  remains 
intact,  then  the  diagram  asks  if  the  remaining  HPU  is  OK  and  can 
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adequately  support  both  hydraulic  actuators.  A key  contributor 
to  this  question  is  whether  the  remaining  HFU  switches  to  high 
speed  mode. 

If  the  turbine  does  not  remain  intact,  the  same  questions  related 
to  cascading  failure  phenomena  and  spatial  interaction  events  as 
those  described  in  Sections  9. 3. 1.2  and  9. 3. 1.3  become  relevant 
in  order  to  describe  the  various  sequences  of  events  that  could 
arise  from  turbine  failure.  Tracing  through  the  ESD  from  page  1 
of  Figure  9.3-1  to  page  2 of  that  figure,  the  diagram  recognizes 
that,  indeed,  further  damage  might  not  occur  to  other  HPUs  and 
flight  critical  equipment,  leaving  only  the  initial  failure  of 
an  HFU.  It  is  also  recognized  that  subsequent  failures  occurring 
as  a consequence  of  shrapnel  and  hydrazine  leakage  could  lead  to 
a LOC/V. 

9. 3. 2. 2 Scenarios  Initiated  by  Turbine  Speed  Control  Failure 
Category 

This  initial  failure  category  includes  all  failures  that  cause  an 
overspeed  of  the  KPU  turbine.  The  combinations  of  control  valve, 
controller,  electric  power  and  other  failures  contributing  to 
turbine  overspeed  are  in  the  split  fraction  models  described  in 
Section  9. 5. 2.1. 

Unlike  the  AFU,  there  is  no  HFU  overspeed  trip  to  prevent  a tur- 
bine runaway.  Therefore,  the  next  question  to  be  asked  is  if  the 
HPU  turbine*  remains  intact  (i.e.,  does  not  come  apart  or  contains 
the  rotor  pieces) . If  both  the  primary  and  secondary  valves  fail 
open,  then  turbine  speed  would  be  expected  to  reach  over  136,000 
rpm  in  about  200  milliseconds.  At  this  speed  the  HPU  turbine  is 
unlikely  to  remain  intact.  The  expected  event  is  that  the  turbine 
rotor  would  come  apart  in  three  pieces  and  the  pieces  would  not  be 
contained  by  the  containment  ring  mounted  inside  the  turbine 
housing.  Shrapnel  would  enter  the  aft  skirt,  accompanied  by 
hydrazine,  which  would  escape  the  HPU  through  the  holes  created  by 
the  pieces  of  turbine  rotor.  The  shrapnel  would  tend  to  spray  a 
pattern  subtending  a 30  degree  arc  centered  on  the  turbine  rotor 
plane  of  rotor  plane  of  rotation.  Some  of  the  shrapnel  could  be 
quite  energetic,  enough  to  damage  flight  critical  electrical/ 
electronic  equipment  in  the  aft  skirt,  compartment  bulkheads,  and 
HPU  fuel  tanks. 

Hydrazine  leakage  would  not  be  expected  to  cause  a fire  in  the 
aft  skirt  because  the  compartment  is  purged  with  nitrogen  and 
low  atmospheric  oxygen  conditions  are  quickly  attained  as  the 
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shuttle  gains  altitude.  Hydrazine  is  capable  of  an  exothermic 
decomposition  reaction  that  tends  to  strip  insulation  from  wires 
and  could  cause  heatup  and  detonation  of  hydrazine  in  other  HPUs. 
The  potential  for  LOC/V  dramatically  increases  if  a hydrazine 
fuel  tank  is  punctured,  thereby  flooding  the  area  around  the 
HPUs  with  hydrazine.  More  detailed  discussion  about  individual 
phenomena  is  presented  in  Section  9.6. 


9. 3. 2. 3 Scenarios  Initiated  by  Hydrazine  Leakage  Category 

This  initial  failure  category  includes  hydrazine  leakage  from 
any  part  of  the  HPU  into  the  aft  skirt,  into  the  fuel  pump 
seal  drain  line,  and  into  the  isolation  valve  or  control  valve 
solenoids.  The  situation  in  which  hydrazine  contaminates  and 
causes  blockage  of  lube  oil  flow  is  included  within  the  perma- 
nent failure  category.  Scenarios  resulting  from  hydrazine 
leakage  follow  a negative  answer  to  the  question  "Fuel  boundary 
remains  intact?” . They  are  described  below. 

The  notation  "possible  hydrazine  attack”  refers  to  the  highly 
corrosive  property  of  hydrazine  and  its  autodecomposition 
property.  Certain  materials  in  the  aft  skirt  such  as  wire 
insulation  serve  as  catalysts  such  that  with  sufficiently  high 
temperatures,  hydrazine  will  decompose  into  its  constituent  parts 
of  nitrogen,  hydrogen  and  ammonia.  Unfortunately,  operating  HPUs 
provide  surfaces  of  sufficient  temperature  to  initiate  this 
reaction.  Furthermore,  materials  inside  the  aft  skirt  such  as 
Kapton  wire  insulation  are  subject  to  rapid  deterioration  under 
contact  with  hydrazine. 

A negative  answer  to  "fuel  boundary  remains  intact?”,  leads  to 
the  question  of  whether  the  leakage  is  severe  enough  to  deplete 
the  fuel  before  SRB  SEP.  In  such  a severe  case,  the  ESD  asks  if 
the  remaining  HPU  can  support  the  remainder  of  the  mission.  If 
not,  loss  of  SRB  hydraulic  control  is  assumed  to  cause  a LOC/V. 

For  less  severe  leaks,  and  for  the  situation  in  which  the 
remaining  HPU  is  adequate,  questions  about  the  potential  adequate 
conditions  for  fire  are  asked. 

Whether  or  not  a fire  occurs  (one  would  not  be  expected)  , the 
ESD  transfers  to  the  SIE  questions  to  decide  on  the  potential 
further  damage  caused  by  escaping  hydrazine.  This  part  of  the 
ESD  was  covered  in  Section  9. 3. 1.3.  More  discussion  on  the 
damage  potential  of  hydrazine  is  presented  in  Section  9.6. 
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9. 3. 2. 4 Scenarios  Initiated  by  Exhaust  Gas  Leakage  Category 

This  category  includes  failures  in  the  exhaust  gas  duct  that 
allow  hot  gas  to  flow  into  the  aft  skirt.  It  also  includes 
failure  of  a small  high  pressure  transducer  line  downstream  of 
the  turbine.  Damage  to  HFUs  and  flight  critical  equipment  may 
be  caused  by  a large  leak  such  that  hot  gas  impingement  on 
electronic  equipment  may  cause  failures  of  components.  Such 
situations  are  phenomena  that  are  considered  extremely  unlikely. 

Since  exhaust  duct  leakage  itself  does  not  fail  an  HFU,  the  ESD 
models  all  potential  scenarios  from  this  initial  failure  category 
as  spatial  interaction  events.  These  have  been  described  in 
Section  9. 3. 1.3. 


9. 3. 2. 5 Defining  the  Damage  States  for  HFU  Scenarios 

The  logic  used  to  define  damage  states  associated  with  HPU 
initiated  scenarios  is  summarized  in  the  part  of  the  ESD  with 
the  designator  "AD" . 

If  any  failures  occur,  any  leakage  detected  or  any  redlines  vio- 
lated before  launch,  then  the  scenario  would  lead  to  a launch 
scrub.  If  an  HFU  fails  after  launch  (a  yes  answer  to  "After 
launch?")  , then  LOC/V  is  assumed  to  occur  if  either  a second  HPU 
fails  or  the  second  HPU  fails  to  switch  to  high  speed  mode.  These 
scenarios  and  damage  states  apply  to  the  two  HFUs  in  either  SRB. 


9.3.3  Summary 

Section  9.3  discussed  the  event  sequence  diagram  used  to  develop 
and  illustrate  scenarios  that  begin  with  initial  failures  of  the 
HFU  and  eventually  lead  to  one  of  three  damage  states:  OK,  launch 
scrub,  and  LOC/V. 

Although  ESDs  are  useful  for  the  development  and  communication  of 
scenarios,  they  are  not  adequate  for  quantifying  the  risk  of  the 
HFU.  Event  trees  and  split  fraction  models  are  used  for  this  and 
are  discussed  in  the  next  two  sections. 


9.4  EVENT  TREE  FOR  HPU  INITIATED  SCENARIOS 

The  ESD  presented  in  the  previous  section  was  developed  to 
clearly  describe  the  sequential  flow  of  events  for  HPU-initiated 
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scenarios  that  could. lead  to  LOC/V,  launch  scrub  or  a successful 
mission.  An  event  tree  was  developed  from  the  ESD  to  facilitate 
quantification  because  computer  techniques  are  available  for 
obtaining  the  frequency  of  scenarios  expressed  in  the  form  of 
event  trees.  Because  quantification  is  the  goal  of  an  event  tree, 
the  top  events  need  not  have  a one-to-one  correspondence  with  the 
boxes  in  the  event  sequence  diagram,  and  the  top  events  need  not 
be  shown  from  left  to  right  in  their  expected  order  of  occurrence. 
Instead,  the  top  events  represent  either  a group  of  boxes  in 
the  ESD  or  a breakdown  of  an  individual  box.  Their  order  is 
established  to  best  capture  the  inter-event  dependencies  and 
facilitate  the  development  of  scenario  dependent  split  fractions. 
The  construction  of  an  event  tree  depends  on  the  analysts'  skill 
and  experience,  knowledge  of  the  data,  and  knowledge  of  the  split 
fraction  models.  The  objective  is  to  best  utilize  the  available 
data  to  obtain  an  accurate  estimate  of  the  frequency  of  each 
scenario. 

The  HPU  event  tree  is  shown  in  Figure  9.4-1.  It  consists  of  the 
initial  event,  which  is  the  attempted  start  of  the  HPUs  in  one 
SRB,  followed  by  13  top  events,  and  ending  with  the  damage  state  of 
each  sequence.  The  damage  state  is  represented  by  an  "X"  beneath 
one  of  four  possibilities:  loss  of  crew  or  vehicle  (LV) , launch 
scrub  (LS) , one  HPU  failed  but  the  mission  successful  (HP) , and  no 
HPUs  failed  (OK) . Taken  together,  a line  of  X's  at  the  end  of  a 
sequence  is  called  a damage  vector.  Each  sequence  is  associated 
with  a damage  vector  and  two  or  more  sequences  can  have  the  same 
damage  vector.  A transfer  in  the  tree  (e.g.  XFR1)  means  that  the 
dotted  line  is  to  be  replaced  by  a previously  defined  group  of 
sequences  with  their  associated  damage  vectors.  For  example,  the 
dotted  lines  that  end  with  XFR1  are  to  be  replaced  by  the  group  of 
sequences  and  their  associated  deunage  vector  to  the  right  of  the 
"X1M  mark  beneath  top  event  "FH". 


9.4.1  Relationship  of  ESD  to  Event  Tree 

Table  9.4.1  presents  a summary  description  of  each  top  event. 
Table  9.4.2  relates  each  top  event  to  one  or  more  ESD  questions. 

9.4.2  Overview  of  the  Event  Tree 

The  sequences  in  Figure  9.4-1  may  be  thought  of  as  falling  into 
five  categories: 


9-15 


DAMAGE  STATES 


Figure  9.4-1  v Event  Tree 


!•  Sequences  numbered  l through  32  are  characterized  by 

spatial  interaction  failures  associated  with  hydrazine 
leakages . 

2.  Sequences  numbered  33  through  56  are  characterized  by 
spatial  interaction  failures  associated  with  combinations 
of  hydrazine  leakage  and  exhaust  has  leakage. 

3.  Sequences  numbered  57  through  78  are  characterized  by 
equipment  failures  in  one  HPU  combined  with  spatial 
interaction  failures  associated  with  hydrazine  leakage 
and  exhaust  gas  leakage. 

4.  Sequences  numbered  79  through  82  are  characterized  by 
equipment  failures  in  both  HPUs  or  by  turbine  rupture  in 
one  HPU  causing  a shrapnel  or  hydrazine  induced  failure 
of  the  second  HPU  or  other  flight  critical  equipment. 

5.  Sequences  numbered  83  to  108  are  characterized  by  turbine 
overspeed  induced  shrapnel. 

The  assumptions,  groundrules  and  approximations  used  to  construct 

the  tree  are: 

1.  HPU  failure  is  defined  as  the  inability  to  power  its 
associated  hydraulic  pump  to  the  extent  that  the  second 
HPU  must  operate  at  higher  speed  in  order  to  provide 
sufficient  pressure  to  the  hydraulic  actuators. 

2.  Two  HPU  failures  in  a single  SRB  lead  to  loss  of  crew  or 
vehicle  (LV) . A second  HPU  is  considered  failed  if  it 
does  not  shift  into  high  speed  following  failure  of  the 
first  HPU. 

3.  The  total  frequency  of  each  damage  state  for  both  SRBs  is 
assumed  to  be  twice  the  frequency  of  that  damage  state 
minus  the  damage  state  frequency  squared,  where  the  damage 
state  frequency  is  calculated  from  Figure  9.4-1. 

4.  The  event  tree  is  to  be  quantified  from  TIG-30  seconds  to 
SRB  SEP. 

5.  Large  hydrazine  leakages  are  defined  as  leaks  for  which 
the  HPU  will  deplete  all  fuel  and  thereby,  fail  before  SRB 
SEP. 
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TABLE  9.4-1 


TOP  EVENT  DEFINITIONS— HPO  EVENT  TREE 


SYMBOL  DEFINITION 

IE  Demand  for  HPU  Start 

HY  Hydraulic  System  Failure* 

TH  Turbine  Overspeed 

PH  Equipment  Failure  of  One  HPU  After  it  Starts 

DH  Failure  of  the  Second  HPU  After  it  Starts 

CH  Failure  of  the  Second  HPU  or  Failure  of  Flight 

Critical  Equipment  Due  to  Spatial  Interactions 
Initiated  by  Failure  of  the  First  HPU 

HH  Failure  of  One  HPU  Due  to  Exhaust  Gas  Leak 


GH  Failure  of  Flight  Critical  Equipment  of  the 

Second  HPU  Due  to  Exhaust  Gas  Leak 

KA  Leakage  of  Hydrazine  from  HPU  A 

KB  Leakage  of  Hydrazine  from  HPU  B 

FH  Failure  of  Flight  Critical  Equipment  or  Two 

HPUs  Due  to  Spatial  Interactions  Initiated 
by  Hydrazine  Leakage 

BA  Hydrazine  Leakage  Causes  Failure  of  HPU  A, 

Given  That  Two  HPUs  Have  Not  Failed 


BB  Hydrazine  Leakage  Causes  Failure  of  HPU  B, 

Given  That  Two  HPUs  Have  Not  Failed 


BH  Failure  of  One  or  Two  HPUs  Upon  Start  or 

While  Running  Before  Launch 


♦This  Top  Event  is  Included  to  Show  How  an  Event  Tree 
Can  Include  Scenarios  that  Cross  Subsystem  Boundaries. 
Quantitative  Evaluation  of  the  Hydraulic  System  is 
Out-of-Scope . 
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TABLE  9.4-2 


RELATIONSHIP  OF  TOP  EVENTS  TO  HPU  ESD 


TOP  EVENT  QUESTIONS  FROM  FIGURE  9.3-1 

HY  "Hydraulic  System  OK?"  and  All  Boxes  Beneath  That  Question 

TH, DH  "Turbine  Control  OK"  and  All  Boxes  Beneath  That  Question 

PH, DH  "No  Permanent  HPU  Failure"  and  All  Boxes  Beneath  That 
Question 

CH  All  Questions  Following  "SIE".  They  Include: 

"SIE  Does  Not  Fail  Flight  Critical  Equipment" 

"SIE  and  Initial  Failure  Does  Not  Cause  Two  HPUs  to  Fail" 

"SIE  and  Initial  Failure  Does  Not  Cause  the  Second  HPU  to 
Fail  With  One  Already  Failed" 

The  Questions  Relate  to  Spatial  Interactions  That  Could 
Follow  Failures  Involving  Shrapnel 

HH,GH  "Exhaust  Gas  Boundary  Remains  Intact?"  and  All  Spatial 
Interaction  Questions  Beneath  It.  The  Spatial  Inter- 
action Questions  Now  Refer  Only  to  the  Damage  Potentially 
the  Damage  Potentially  Caused  by  Exhaust  Gas  Release 

KA,KB  "Fuel  Boundaries  Remain  Intact" 

FH  All  Questions  Following  "SIE".  The  Spatial  Interaction 
Questions  Now  Refer  to  the  Damage  Potentially  Caused  by 
Hydrazine  in  the  Aft  Compartment  to  Flight  Critical 
Equipment  or  HPUs 

BA,BB  "Sufficient  Fuel  Left  for  HPU  Ascent"  and  All  Questions 
Following  "SITE".  These  Spatial  Interaction  Questions 
Now  Refer  to  the  Damage  Potentially  Caused  by  Hydrazine 
in  the  Aft  Compartment  to  an  Individual  HPU 

BH  The  Question  "After  Launch"  and  All  Questions  Following 
It.  This  Top  Event  Determines  the  Fraction  of  Each 
Scenario  That  Occurs  Before  or  After  Launch.  It  is  Used 
to  Decide  on  Whether  the  Scenario  Ends  in  Launch  Scrub 
or  LOC/V 
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6.  All  failures  that  occur  before  launch  are  assumed  to  lead 
to  launch  scrub.  The  potential  for  HPU  failures  to  cause 
loss  of  crew  or  vehicle  while  sitting  on  the  pad  is 
considered  negligibly  small. 

7.  The  HPUs  are  assumed  to  be  identical  and  spatially 
symmetrical  to  each  other  so  that  frequencies  and 
consequences  are  independent  of  which  HPU  has  failed. 
Therefore,  HPU  B has  been  assigned  as  the  failed  HPU 
with  no  loss  of  generality  or  quantitative  accuracy 
when  TH,  PH,  or  HH  fail. 

8.  The  possibility  of  two  HPUs  failing  independently  in  the 
same  flight  from  turbine  overspeed  is  not  modeled  because 
the  frequency  of  this  sequence  is  much  smaller  than  the 
frequency  of  sequences  leading  to  loss  of  crew  or  vehicle 
that  involve  one  turbine  overspeed  with  other  failures. 

9.  The  frequency  of  failure  of  a running  HPU  before  launch 
is  approximated  by  the  ratio  of  the  time  it  runs  before 
launch  to  the  total  time  from  L/0-5  to  SRB  SEP.  All 
start  failures  are  modeled  as  occurring  before  launch. 


9.4.3  Description  of  Toe  Events 

A summary  description  of  each  top  event  and  its  relationship 
to  the  rest  of  the  event  tree  is  provided  in  this  section. 

The  detailed  model  that  provides  the  basis  for  assessing  the 
frequency  of  occurrence  of  each  top  event  split  fraction  is 
provided  in  Section  9.5.  The  data  required  by  these  models 
is  described  in  Section  10. 

Top  Event  HY:  Hydraulic  System  Failure 

This  event  is  included  as  an  illustration  of  how  an  event  tree 
can  include  scenarios  that  cross  subsystem  boundaries.  A failure 
of  HY  implies  that  its  associated  HPU  is  useless.  The  event  tree, 
therefore,  treats  HY  failure  as  if  an  HPU  had  failed. 

Top  Event  TH:  Turbine  overspeed 

This  event  occurs  if  both  the  primary  and  secondary  control  valves 
fail  in  the  open  position  while  the  HPU  is  operating.  Mechanical, 
electrical  and  controller  causes  are  included.  Turbine  overspeed 
implies  that  the  HPU  has  failed.  It  is  then  appropriate  to  ask  if 
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the  resulting  shrapnel  and  hydrazine  escape  have  caused  a second 
HFU  or  other  flight  critical  equipment  in  the  aft  compartment 
(i.e.,  Top  Event  CH)  to  fail.  The  tree  also  asks  if  the  other  HPU 
could  have  failed  independently  from  the  turbine  overspeed  either 
by  equipment  failure  (e.g. , Top  Event  DH)  or  by  leakages.  Occur- 
rence, of  this  event  after  launch  and  in  the  absence  of  other 
failures  leads  to  the  HP  deunage  state. 

Top  Event  PE:  HPU  Equipment  Failure  After  HPU  Start 

This  event  occurs  if  any  piece  of  equipment  or  combinations  of 
equipment  combine  to  prevent  an  HPU  from  providing  sufficient 
power  to  its  hydraulic  pump  as  defined  above.  For  example,  this 
event  includes  failure  of  the  turbine  rotor  at  normal  speed. 

This  event  excludes,  however,  turbine  overspeed,  leakages,  and 
start  failures.  This  top  event  does  not  include  failures  caused 
by  erroneous  commands  from  sources  external  to  the  HPU  (e.g.,  the 
GPCs) . Such  failures  are  outside  the  scope  of  this  study.  The 
combinatorial  failures  included  in  this  top  event  are  described  in 
detail  in  Section  9.5.  Occurrence  of  this  event  after  launch  and 
in  the  absence  of  other  failures  leads  to  the  HP  damage  state. 

Top  Event  DH:  Failure  of  Second  HPU  After  HPU  Start 

This  event  is  asked  if  either  PH  occurs  or  TH  occurs.  It 
occurs  if  the  second  HPU  fails  given  that  one  HPU  is  known  to 
have  failed.  The  same  combinations  of  equipment  failures  that 
contribute  to  PH  are  also  relevant  here.  Occurrence  of  this 
event  after  launch  leads  to  loss  of  crew  or  vehicle. 

Top  Event  CH:  Spatial  Interaction  Failure  of  Second  HPU  or 

Flight  Critical  Equipment 

This  event  includes  failure  of  the  second  HPU  or  flight  critical 
equipment  due  to  shrapnel  or  hydrazine  induced  cascading  damage. 

It  considers  the  possibility  that  shrapnel  and  hydrazine  could 
be  produced  by  turbine  rotor  failure  either  in  an  overspeed  or 
normal  speed  condition.  Occurrence  of  this  event  after  launch 
leads  to  loss  of  crew  and  vehicle. 

Top  Event  HH:  Exhaust  Gas  Leakage  Fails  One  HPU 

This  event  includes  the  possibility,  no  matter  how  remote,  that 
exhaust  gas  leakage  can  fail  an  HPU.  Occurrence  of  this  event 
after  launch  and  in  the  absence  of  other  failures  leads  to  the 
HP  damage  state. 
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Top  Event  GH:  Exhaust  Gas  Leakage  Pails  Second  HPU 

This  event  includes  the  possibility  that  exhaust  gas  leakage 
fails  a second  HPU,  given  that  one  HPU  is  known  to  have  failed 
from  exhaust  gas  leakage  or  from  other  causes.  Occurrence  of 
this  event  after  launch  leads  to  loss  of  crew  and  vehicle. 

Top  Event  XX:  Hydrazine  Leakage  in  HPU  X 

This  event  includes  leakages  of  hydrazine  from  anywhere  in  HPU  A 
to  the  aft  skirt. 

Top  Event  KB:  Hydrasine  Leakage  in  HPU  B 

This  event  includes  leakages  of  hydrazine  from  anywhere  in  HPU  B 
to  the  aft  skirt.  The  event  tree  structure  involving  KA  and  KB 
includes  all  combinations  of  HPUs  leaking  individually  or  together 
in  the  same  mission. 

Top  Event  PH:  Laakaga  Induced  Failure  of  Both  HPUs  or  Flight 

Critical  Equipment 

This  event  includes  those  spatial  interactions  due  to  the  presence 
of  hydrazine  in  the  aft  skirt  around  the  HPUs  which  causes  failure 
of  both  HPUs  or  other  flight  critical  equipment.  Occurrence  of 
this  event  after  launch  leads  to  loss  of  crew  and  vehicle. 

Top  Event  BX:  Leakage  Induced  Failure  of  HPU  X 

This  event  includes  spatial  interaction-induced  failure  of  HPU  A 
from  the  presence  of  hydrazine  in  the  aft  skirt,  given  that  two 
HPUs  have  not  failed.  Occurrence  of  this  event  after  launch  and 
in  the  absence  of  other  failures  leads  to  the  HP  damage  state. 

Top  Event  BB:  Leakage  Induced  Failure  of  EPU  B 

This  event  includes  spatial  interaction-induced  failure  of  HPU  B 
from  the  presence  of  hydrazine  in  the  aft  skirt,  given  that  two 
HPUs  have  not  failed.  Occurrence  of  this  event  after  launch  and 
in  the  absence  of  other  failures  leads  to  the  HP  damage  state. 

Top  Event  BH:  Failure  occurs  Before  Launch 

This  event  includes  all  combinations  of  start  failures  of  either 
or  both  HPUs.  It  also  includes  that  portion  of  running  failures 
of  either  or  both  HPUs  that  occurs  before  launch.  Occurrence  of 
this  event  leads  to  the  launch  scrub  damage  state. 
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9.5  SPLIT  FRACTION  MODEL  DEVELOPMENT 


9.5.1  Introduction 

A guiding  principle  for  the  modeling  and  computational  effort  was 
to  place  more  emphasis  and  detail  in  those  aspects  of  the  model 
that  promised  to  be  important  to  risk.  This  meant,  for  example, 
that  many  scenarios  involving  large  numbers  of  failure  occur- 
rences would  not  be  important  because  of  their  low  associated 
probabilities.  Such  scenarios  can  be  quickly  estimated  by  a 
preliminary  analysis  using  a general  knowledge  of  the  model  and 
the  basic  event  data.  It  was  not  difficult,  for  example,  to 
estimate  the  order  of  magnitude  of  the  total  LOC/V  frequency 
from  a knowledge  of  the  event  tree,  HPU  design,  and  the  failure 
history  database,  without  going  through  the  formal  computer 
analysis.  In  some  cases,  however,  knowledge  to  make  such  initial 
assessments  was  not  available  to  the  team  until  late  in  the 
study.  It  was  necessary  to  include  such  events  in  the  analysis. 
One  of  the  most  prominent  examples  is  the  case  of  consequential 
permanent  failures  resulting  from  exhaust  gas  leaks.  Exhaust  gas 
leaks  were  identified  in  the  master  logic  diagrams  as  an 
initiating  failure  and  were,  therefore,  included  in  the  event 
trees.  Their  frequency  of  occurrence  and  the  conditional  prob- 
ability of  consequential  failure  of  an  HPU  was  not  assessed 
until  models  were  under  development.  Their  contribution  to  risk 
was  determined  to  be  negligibly  small  (less  than  0.1  per  cent  of 
the  total  LOC/V  frequency) . The  exhaust  models  are,  therefore, 
more  complex  than  necessary. 

In  developing  the  interrelated  event  tree  and  fault  tree  models, 
it  was  also  necessary  to  strike  a balance  in  modeling  complexity 
be’t'wsen  these  two  types  of  logic  trees.  This  was  an  iterative 
process  that  began  by  developing  a simple  first-cut  event  tree 
and  its  associated  fault  trees.  The  fault  trees  were  found  to 
too  complex  to  be  analyzed  easily.  This  led  to  a more 
complex  event  tree,  and  the  associated  fault  trees  were  found  to 
be  much  more  tractable.  This  iterative  process  was  continued 
until  a reasonable  balance  was  achieved. 

The  fault  tree  construction  was  influenced  by  data  availability. 

As  discussed  in  Section  5.0,  it  is  pointless  to  model  components 
at  a level  below  that  for  which  data  exists.  Furthermore,  the 
availability  of  data  in  a particular  form  influences  the  way  basic 
events  are  expressed  in  the  fault  tree.  The  process  of  split 
fraction  modeling  is  iterative  and  highly  interactive  with  the 
event  tree  development  and  data  analysis  process. 
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As  indicated  in  Section  9.4,  the  event  tree  for  HPU  is  a logic 
diagram  that  shows  the  various  admissible  combinations  of  top 
event  occurrences  and  nonoccurrences  that  constitute  the  various 
scenarios  to  be  analyzed.  In  order  to  be  able  to  compute  the 
scenario  occurrence  frequencies,  it  is  necessary  to  compute  the 
appropriate  split  fractions  for  the  top  events  appearing  in  each 
scenario.  In  some  cases,  these  spirt  fractions  are  single  numbers 
determined  from  all  available  evidence,  as  described  in  Section 
5.0.  In  other  cases,  however,  the  top  events  represented  a sub- 
stantial part  of  the  HPU,  and  the  corresponding  split  fractions 
were  computed  from  fault  tree  analyses.  The  paragraphs  that 
follow  describe  the  fault  trees  that  were  developed  for  calcu- 
lating the  split  fractions  for  the  event  tree  top  events.  The 
outcome  of  the  split  fraction  models  when  evaluated  by  the  data 
for  the  basic  events  is  a set  of  split  fraction  cause  tables  as 
described  in  Section  5. 

Before  describing  the  fault  trees,  it  is  appropriate  to  describe 
some  general  ground  rules,  assumptions,  and  analysis  consider- 
ations that  are  fundamental  to  all  of  the  fault  trees.  One  of 
the  assumptions  concerns  the  basic  symmetry  in  HPU  physical 
locations.  Because  there  is  no  fundamental  probabilistic 
importance  associated  with  HPU  location,  there  is  no  particular 
significance  to  the  name  of  an  HPU  that  fails.  That  is,  if  an 
unidentified,  unnamed  HPU  fails  in  conjunction  with  one  of  the 
top  events  in  the  event  tree  (call  that  event  El)  , then  that 
failed  HPU  can  be  "named"  HPU  B without  any  loss  of  generality. 
The  actual  name  of  that  failed  HPU  is  of  no  importance  in 
determining  probabilities.  Consider  now  some  other  top  event 
(call  it  E2)  that  appears  to  the  right  of  event  El  in  the  event 
tree.  Fault  tree  models  can  now  be  constructed  for  event  E2  in 
which  the  failed  HPU  B does  not  appear.  This  represents  a great 
simplification  in  the  modeling  process.  After  some  preliminary 
modeling  and  quantification  of  exhaust  duct  leakage,  it  was 
concluded  that  exhaust  duct  leakage  would  be  extremely  negligible 
contributor  to  loss  of  crew  and  vehicle.  The  reason  for  this  is: 

1.  The  frequency  of  occurrence  of  exhaust  duct  leakage 
either  from  shrapnel  or  from  random  failure  is  very  low 
(approximately  IE-5  per  hour  of  HPU  operation.) 

2.  Exhaust  duct  leakage  does  not  constitute  loss  of  an  HPU. 

3.  The  probability  of  failing  an  HPU  or  flight  critical 
equipment  in  the  aft  skirt  of  the  SRB  as  a consequence 
of  exhaust  gas  impingement  is  quite  low  (approximately 
IE- 3 per  leak) . 
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4.  we  would,  therefore,  expect  that  a LOC/V  owing  to  exhaust 
gas  leak  would  occur  approximately  once  in  100  million 
missions. 

Rather  than  produce  a detailed  quantification  for  such  a remote 
occurrence,  we  chose  to  simplify  the  effort  and  assess  the 
frequency  of  all  scenarios  associated  with  exhaust  duct  leaks  as 
negligible,  even  though  a detailed  model  had  already  been 
developed . 

Prior  analysis  experience  has  shown  that  common  cause  failures 
tend  to  be  important  risk  contributors  because  multiple  failures 
can  occur  as  a result  of  a single  failure  condition  common  to  two 
or  more  units.  Usually  this  is  at  a substantially  higher  prob- 
ability than  that  associated  with  multiple  independent  failures. 
Hence,  it  was  important  to  include  such  potential  contributors 
wherever  they  were  indicated  by  the  recorded  AFU  and  HPU  failure 
history  database. 

In  most  cases  the  fault  trees  are  intended  to  provide  prob- 
abilistic results  that  serve  directly  as  the  split  fractions 
for  their  associated  top  events.  In  some  cases,  however,  the 
fault  trees  provide  intermediate  numerical  results  that  must 
be  combined  with  the  numerical  results  of  other  models  to  obtain 
the  required  top  event  split  fractions.  For  example,  two  con- 
secutive top  events  in  the  event  tree  in  Figure  9.4-1  are  labeled 
PH  and  DH.  PH  represents  the  event  in  which  one  or  more  HPUs  have 
a permanent  failure,  while  DH  represents  the  event  in  which  both 
HPUs  fail  given  that  at  least  one  has  failed.  The  numeric  quanti- 
fication of  the  fault  tree  for  PH  yields  the  associated  split 
fraction  directly.  However,  the  numeric  quantification  of  the 
fault  tree  for  DH  yields  the  probability  that  both  HPUs  fail. 

To  obtain  the  split  fraction  for  the  DH  event,  divide  the  DH 
result  by  the  PH  result,  thereby  giving  the  probability  of  both 
HPUs  failing  given  that  one  or  more  failures  are  known  to  have 
occurred.  This  type  of  analysis  also  applies  to  the  top  events 
HH  and  GH  in  that  same  event  tree. 

Event  trees  are  simply  logic  diagrams  that  indicate  what  specific 
combinations  of  events  occur  and  do  not  occur;  such  trees  do  not 
ordinarily  convey  any  information  as  to  the  order  in  which  events 
occur.  Thus,  the  fault  tree  models  have  to  be  carefully  construc- 
ted to  account  for  order  when  order  is  of  concern.  For  example, 
in  the  HPU  event  tree  shown  in  Figure  9.4.1,  there  are  top  events 
labeled  TH  and  DH.  TH  accounts  for  the  potential  for  a turbine  run 
away,  and  DH  accounts  for  the  possibility  of  a second  independent 
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permanent  failure  of  an  HPU.  Since  the  TH  event  appears  first  in 
the  event  tree,  the  fault  tree  for  it  models  the  potential  for  a 
runaway  of  one  out  of  two  HFUs.  The  DH  event  must  then  consider 
the  implications  of  the  order  in  which  the  two  events  occur.  If 
the  TH  event  occurs  first  (which  is  taken  to  occur  with  a prob- 
ability of  0.5),  then  the  TH  analysis  based  on  one  HPU  failing 
out  of  two  is  correct,  and  the  DH  fault  tree  must  consider  the 
potential  for  the  one  remaining  HPU  to  fail  (because  the  other 
one,  which  is  named  HPU  B,  has  already  failed  by  runaway)  . How- 
ever, if  DH  occurs  first  (with  a probability  of  0.5)  , then  the  DH 
fault  tree  must  be  based  on  one  out  of  two  failing , and  the  TH 
fault  tree  should  be  based  on  failing  the  one  remaining  HPU.  Since 
the  TH  analysis  is  already  based  on  one  out  of  two,  a correction 
factor  must  be  included  in  the  DH  fault  tree  to  correct  from  the 
one-out-of-two  TH  analysis  to  the  proper  one-out-of-one  basis 
needed  for  TH  in  this  case.  In  summary,  some  complexity  is  added 
to  the  f iult  trees  to  accurately  account  for  the  order  in  which 
top  events  in  the  event  tree  could  occur.  Such  correction  factors 
will  be  found  below  in  a number  of  the  fault  trees,  and  the 
"secondary"  fault  trees  needed  to  cover  the  one-out-of-one  case 
for  TH  (and  other  such  top  events)  are  also  presented  below.  The 
specific  TH/DH  case  mentioned  here  is  discussed  (with  the  appro- 
priate fault  trees)  in  Section  9. 5. 2. 3.  A special  naming 
convention  has  been  used  in  all  of  the  fault  trees.  The  first  two 
characters  are  the  same  as  the  two  characters  in  the  event  tree 
top  event  for  which  the  fault  tree  was  developed.  For  the  basic 
events,  the  third  and  fourth  characters  identify  the  type  of 
component  being  modeled,  and  the  fifth  character  identifies  its 
particular  failure  mode.  For  the  gates,  the  third,  fourth,  and 
fifth  characters  identify  the  level  of  the  gate  in  the  fault  tree 
and  distinguish  between  gates  at  each  level.  The  last  (sixth) 
character  is  A or  B to  identify  the  specific  HPU  in  which  the 
component  or  gate  resides.  If  the  last  character  is  a 0,  then  it 
identifies  a generic  component  or  gate  — — that  is,  something  (such 
as  a common  cause  failure)  not  associated  with  any  specific  HPU. 
The  details  about  the  first  five  characters  in  these  designators 
are  given  in  Section  11.0. 

To  simplify  the  general  appearance  of  the  fault  trees,  they  are 
shown  in  full  only  for  HPU  A.  That  detailed  development  is  shown 
as  a transfer  with  a label  of  the  form  XYA.  The  other  HPU  is  then 
represented  as  transfers  in  with  a label  of  the  form  XYB.  All 
gates  and  basic  events  shown  in  subtree  XYA  that  end  with  an  A are 
converted  to  a B in  subtree  XYB. 

While  there  are  quite  a number  of  similarities  between  the  Orbiter 
AFUs  and  the  HFUs,  there  are  a number  of  differences.  These 
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differences  do  not  affect  the  overall  methods,  assumptions, 
groundrules,  or  approaches  to  the  analyses,  but  they  do  affect  the 
details  of  the  analyses.  The  primary  difference  between  the  AFUs 
and  the  HPUs  is  that  there  are  three  APUs  involved  in  the  Shuttle 
Orbiter,  while  only  two  HPUs  are  used  for  each  of  two  SRBs.  Since 
only  one  SRB  at  a time  is  modeled,  only  two  HPUs  at  a time  are 
modeled. 

The  primary  gas  generator  valve  provides  the  control  for  both 
normal  and  high-speed  operation  of  the  HPU  turbine  (whereas  the 
secondary  valve  provides  high-speed  control  for  the  APU) . Over- 
all, the  GGVM  control  circuitry  for  the  HPU  is  much  simpler  than 
for  the  APU.  In  particular,  no  dedicated  circuitry  is  provided 
to  trip  the  HPU  turbine  in  the  event  of  an  overspeed  or  under- 
speed of  the  HPU  turbine  (although  circuitry  is  provided  to  give 
the  secondary  gas  generator  valve  a backup  controlling  function 
in  the  event  that  the  primary  valve  fails  to  control  at  either 
normal  or  high  speed) . Hence,  there  is  no  over/underspeed  inhibit 
circuit  for  the  HPU.  Also,  the  fuel  tank  has  no  diaphragm,  and 
there  is  only  one  fuel  tank  isolation  valve  (instead  of  the  two 
valves  found  in  the  APUs) . No  cooling  water  systems  are  provided 
for  the  gas  generator  injector,  the  GGVM,  the  fuel  tank  or  lines, 
or  the  lube  oil.  Furthermore,  there  are  no  heater  circuits  for 
the  tanks  or  lines,  but  heater  circuitry  is  provided  for  the  gas 
generator  to  permit  control  of  GG  temperature  to  within  the  limits 
required  for  safe  startup  of  the  HPU  during  prelaunch  operations. 
All  of  these  these  simplifications  in  the  HPU  hardware  result  in 
corresponding  simplifications  in  the  fault  tree  models  developed 
to  quantify  the  HPU  split  fractions.  The  paragraphs  that  follow 
provide  a brief  description  of  the  fault  tree  models  developed  to 
compute  the  split  fractions  for  the  top  events  in  the  HPU  event 
tree  shown  in  Figure  9.4-1.  In  general,  the  HPU  fault  trees  are 
very  similar  to  the  corresponding  APU  fault  trees  described  in 
Sections  6.5.2  and  6.5.3.  The  primary  differences  arise  from  the 
fact  that  the  HPUs  are  simpler  in  design  and  operation  than  the 
APUs. 


9.5.2  HPU  Fault  Tree  Models 
Top  Event  TH 

The  first  top  event  in  the  HPU  event  tree  shown  in  Figure  9.4-1 
is  TH.  This  event  represents  a specific  type  of  HPU  permanent 
failure  — namely,  one  involving  turbine  runaway,  in  which  fail- 
ures cause  the  turbine  speed  to  increase  above  normal  operating 
levels.  This  particular  failure  mode  has  been  separated  from  all 
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of  the  other  permanent  failures  because  of  the  potential  for  con- 
sequential failure  of  the  other  HFU  or  flight-critical  equipment 
due  to  the  high-energy  shrapnel  generated  by  the  overspeed. 

• 

The  fault  trees  developed  for  TH  are  shown  in  Appendices  C9.5-1  and 
C9.5-2.  The  first  fault  tree  (labeled  TH)  covers  the  model  for  the 
case  of  one  runaway  out  of  two  HPUs,  while  the  second  one  (labeled 
TH-D)  models  the  case  of  one  runaway  out  of  one  HPU.  The  second 
fault  tree  is  provided  to  support  top  events  to  the  right  in  the 
event  tree  where  the  order  in  which  events  occur  is  a consideration 
Both  fault  trees  model  runaway  in  terms  of  having  both  the  primary 
and  secondary  control  valves  open.  The  numerical  result  computed 
from  fault  tree  TH  in  Appendix  CIO. 5-1  directly  yields  the 
requisite  split  fraction  for  the  top  event  TH  in  the  event  tree. 


Top  Event  PE 

The  second  top  event  in  the  HPU  event  tree  shown  in  Figure  9.4-1 
is  PH.  This  event  represents  all  but  two  contributors  to  the 
permanent  failure  of  at  least  one  of  the  two  HPUs,  where  the  two 
exceptions  are:  (1)  the  turbine  runaway  failures  covered  by  TH, 
and  (2)  the  start  failures,  which  are  more  conveniently  analyzed 
in  the  top  event  BH  (the  failures  occurring  before  lift-off  and 
contributing  to  launch  scrub) . 

The  fault  trees  developed  for  PH  are  shown  in  Appendices  C9. 5-3-1 
through  C9. 5-3-4  and  C9.5-4.  The  first  fault  tree  (labeled  PH) 
models  the  permanent  failure  of  at  least  one  out  of  two  HPUs, 
while  the  second  one  (labeled  PH-T)  models  the  permanent  failure 
of  one  out  of  one  HPUs.  This  second  fault  tree  is  provided  to 
support  top  events  to  the  right  of  event  PH  in  the  event  tree 
where  the  order  in  which  events  occurs  is  a consideration. 

Both  PH  fault  trees  model  permanent  failures  in  terms  of  the 
following  primary  failure  modes: 

Fuel  line  blockage 

Fuel  pump  failure 

Low  fuel  tank  pressure 

Turbine  fails  to  run 

Turbine  wheel  shutdown  failure 

Gearbox  fails  to  run 

Gas  generator  run  failure 

Fuel  tank  isolation  valve  fails  closed 

Common  cause  failure  of  lube  oil  blockage 

due  to  hydrazine  leakage  through  a gearbox 

shaft  seal 
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The  numerical  result  computed  from  fault  tree  PH  in  directly 
yields  the  requisite  split  fraction  for  the  top  event  PH  in' the 
event  tree. 

Top  Event  DH 

The  third  top  event  in  the  HPU  event  tree  is  DH.  This  event 
represents  all  but  two  contributors  to  the  permanent  failure  of 
both  of  the  HPUs,  where  the  two  exceptions  are  (l)  the  turbine 
runaway  failures  covered  by  TH,  and  (2)  the  start  failures,  which 
are  more  conveniently  analyzed  in  the  top  event  BH  (the  failures 
occurring  before  lift  off  and  contributing  to  launch  scrub) . The 
only  basic  difference  between  this  event  and  the  event  PH  is  that 
DH  accounts  for  both  of  the  HPUs  failing,  while  PH  accounts  for 
at  least  one  out  of  two  HPUs  failing. 

The  PH  event  represents  the  probability  of  an  independent 
permanent  failure  occurring  in  at  least  one  HPU,  and  the  DH 
event  represents  the  probability  of  an  independent  permanent 
failure  occurring  in  both  HPUs  given  that  at  least  one  is  known 
to  have  occurred.  The  scenario  in  which  PH  occurs  and  DH  does 
not  occur  represents  the  case  in  which  exactly  one  HPU  has  an 
independent  permanent  failure.  The  scenario  in  which  both  PH 
and  DH  occur  represents  the  case  in  which  both  HPUs  have 
independent  permanent  failures.  When  the  TH  event  occurs  in  the 
event  tree,  only  the  DH  event  is  questioned  with  regard  to  the 
occurrence  of  a second  permanent  failure  as  a result  of  an  inde- 
pendent cause;  that  is,  this  case  is  not  addressed  via  event  PH. 
This  is  simply  an  analysis  convention  that  was  adopted  for  con- 
venience; this  situation  could  have  been  addressed  by  using  PH. 

The  fault  trees  developed  for  DH  are  shown  in  Appendices  C9. 5-5-1 
through  C9. 5-5-3.  Fault  tree  applies  to  the  first  (uppermost) 
node  for  DH  in  the  event  tree  and  models  the  permanent  failure 
of  both  HPUs  (for  use  in  conjunction  with  event  PH).  Fault  tree 
applies  to  the  second  (lower)  mode  for  DH  in  the  event  tree.  This 
models  the  second  permanent  failure  that  occurs  in  conjunction 
with  the  turbine  runaway  failure  modeled  by  the  TH  event  and  also 
models  the  case  of  the  permanent  failure  of  the  one  remaining  HPU, 
which  is  provided  to  support  top  events  to  the  right  of  event  DH 
in  the  event  tree  where  another  failure  occurs  and  the  order  in 
which  failures  occur  is  a consideration. 

Fault  tree  DH2  in  is  an  illustration  of  the  logic  required  to 
account  for  the  order  in  which  events  occur,  as  discussed  in 
Section  9.5.1.  If  event  TH  occurs  first,  then  the  TH  one-out-of- 
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two  fault  tree  model  is  correct,  and  the  DH  logic  must  consider 
one-out-cf-one  failure  logic.  This  situation  is  shown  on  the 
right  side  of  the  diagram  in  Appendix  C9. 5-5-6.  If,  on  the  other 
hand,  DH  occurs  first,  then  the  TH  one-out-of-two  logic  must  be 
corrected  to  one— out— of— one  logic,  and  the  correct  logic  for  DH 
is  one-out-of-two.  The  correction  factor  represented  by  the 
basic  event  DHTCFO  is  the  ratio  of  the  result  from  the  TH-D  tree 
to  that  from  the  TH  tree. 

AH  of  the  fault  trees  needed  for  the  DH  event  model  permanent 
failures  in  terms  of  the  following  primary  failure  modes: 

Fuel  line  blockage 

Fuel  pump  failure 

Low  fuel  tank  pressure 

Turbine  fails  to  run 

Turbine  wheel  shutdown  failure 

Gearbox  fails  to  run 

Gas  generator  run  failure 

Fuel  tank  isolation  valve  fails  closed 

Common  cause  failure  of  lube  oil  blockage  due 

to  hydrazine  leakage  through  a gearbox  shaft 

seal 

The  numerical  result  from  fault  tree  DH1  must  be  divided  by  the 
numerical  result  from  fault  tree  PH  to  obtain  the  split  fraction 
needed  for  node  1 for  the  event  DH;  this  split  fraction  is  the 
conditional  probability  of  both  HPUs  failing  by  permanent  failure 
given  that  one  or  more  permanent  failures  are  known  to  have 
occurred.  The  numerical  result  computed  from  fault  tree  DH2  in 
directly  yields  the  requisite  split  fraction  for  node  2 of  top 
event  DH  in  the  event  tree. 

Top  Event  CH 

The  fourth  top  event  in  the  HFU  event  tree  is  CH.  This  event 
represents  the  consequential  permanent  failure  of  flight  critical 
equipment  or  of  at  least  one  HFU  following  the  permanent  failure 
of  the  other  HPU. 

The  CH  fault  tree  is  shown  in  two  parts  in  Appendices  C9.5-6 
and  C9.5-7.  Fault  tree  CHI  applies  to  the  first  (uppermost) 
node  for  CH  in  the  event  tree  and  models  the  consequential 
failure  of  flight  critical  equipment  or  of  the  one  remaining  HPU 
following  the  non-runaway  permanent  failure  of  one  HPU  (from 
event  PH) . Fault  tree  applies  to  the  second  (lower)  node  for  CH 
in  the  event  tree.  This  models  the  consequential  permanent 
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failure  of  flight  critical  equipment  or  of  the  one  remaining  HPU 
following  a turbine  runaway  failure  (from  event  TH) . Separate 
fault  trees  are  required  because  the  potential  for  consequential 
failure  following  a turbine  runaway  is  higher  than  for  other 
forms  of  permanent  failure.  The  numerical  results  computed  from 
both  fault  trees  CHI  and  CH2  directly  yield  the  requisite  split 
fractions  for  nodes  1 and  2 of  top  event  CH  in  the  event  tree. 

Top  Event  HE 

The  fifth  top  event  in  the  HPU  event  tree  is  HH.  This  event 
represents  the  failure  of  at  least  one  HPU  as  a consequence  of 
an  exhaust  gas  leak  in  at  least  one  HPU.  The  model  is  based  on 
the  realization  that  the  potential  for  a non-leaking  HPU  to  fail 
is  extremely  remote.  Thus,  the  model  only  accounts  for  failures 
of  HPUs  that  are  themselves  experiencing  hot  gas  leaks.  This  is 
also  a very  low  frequency,  as  described  earlier. 

The  fault  tree  developed  for  HH  is  shown  in  Appendix  C9.5-8. 

That  fault  tree  (labeled  HH)  models  the  permanent  failure  of  at 
least  one  out  of  two  HPUs  as  a consequence  of  exhaust  gas  leaks. 

. The  numerical  result  computed  from  fault  tree  HH  directly  yields 
the  requisite  split  fraction  for  the  top  event  HH  in  the  event 
tree.  A subset  of  the  HH  top  event  deals  with  a path  in  which 
one  failure  has  already  occurred.  The  split' fraction  for  this 
path  is  modeled  as  the  HHT  tree  in  Appendix  C9.5-9. 

Top  Event  GH 

The  sixth  top  event  in  the  HPU  event  tree  is  GH.  This  event  repre- 
sents the  failure  of  at  least  two  HPUs  as  a consequence  of  exhaust 
gas  leaks  in  both  HPUs,  given  that  at  least  one  HPU  is  known  to 
have  failed  as  a consequence  of  a gas  leak.  The  model  is  based  on 
the  realization  that  the  potential  for  a non-leaking  HPU  to  fail 
is  extremely  remote.  Thus,  the  model  only  accounts  for  failures  of 
HPUs  that  are  themselves  experiencing  hot  gas  leaks. 

The  fault  trees  developed  for  GH  are  shown  in  Appendices  C9.5-10 
and  C9.5-11.  The  numerical  results  computed  from  those  four  fault 
trees  are  used  in  the  same  basic  manner  as  described  above  for 
event  DH  to  provide  the  requisite  split  fractions  for  the  four 
nodes  of  top  event  GH  in  the  event  tree. 

Top  Events  KA,  KB 

The  seventh  and  eighth  top  events  in  the  HPU  event  tree  are  KA  and 
KB.  These  events  represent  the  independent  occurrence  of  a fuel 
leak  in  HPU  A and  B.  Rather  than  consider  the  logic  for  these  two 
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top  events  in  terms  of  a fault  tree  or  a set  of  two  fault  trees, 
it  was  much  simpler  to  express  the  logic  in  terms  of  a simple 
event  tree  as  a means  of  representing  the  probability  values 
needed  for  the  various  combinations  of  leakage  occurrences.  This 
event  tree  is  shown  in  Appendix  C9.5-12.  The  split  fraction  to 
be  used  for  each  node  of  each  top  event  ir.  the  event  tree  is  shown 
at  the  appropriate  node  in  this  figure.  Lambda  represents  the 
failure  rate  with  which  independent  leakage  occurs,  and  "tM  is  the 
time  interval  of  interest  over  which  the  leak  can  occur.  Beta 
represents  a common  cause  factor,  which  is  a measure  of  the  condi- 
tional probability  that  a second  HPU  has  a fuel  leak  given  that 
one  is  already  known  to  be  leaking.  Lambda  and  beta  can  both  be 
estimated  from  the  Shuttle  experience  data,  as  discussed  in 
Section  11.0. 

An  important  characteristic  of  the  split  fraction  formulas  given 
for  the  various  nodes  in  Appendix  C9.5— 12  is  that  the  scenario 
probabilities  shown  for  the  two  scenarios  having  exactly  one  HPU 
leaking  are  both  identical.  Also,  the  sum  of  the  probabilities 
for  all  four  scenarios  is  exactly  one. 

To  use  the  leakage  split  fractions  listed  in  Appendix  C9.5-12,  it 
is  simply  a matter  of  matching  the  nodes  in  that  figure  with  the 
corresponding  nodes  in  the  event  tree.  That  is,  the  split 
fraction  P21  for  node  1 of  the  event  KB  is  matched  to  all  nodes 
in  the  event  tree  for  which  KB  occurs  when  KA  does  not  occur. 
Likewise,  the  split  fraction  P22  for  node  2 of  the  event  KB  is 
matched  to  all  nodes  in  the  event  tree  for  which  KA  does  occur. 

Top  Event  7H 

The  ninth  top  event  in  the  HPU  event  tree  is  FH.  This  event 
represents  the  permanent  failure  of  flight  critical  equipment 
as  a direct  consequence  of  a fuel  leak  in  one  or  more  HPUs. 

No  fault  tree  was  constructed  for  this  event  since  the  requisite 
split  fraction  is  simply  one  number  that  depends  only  on  the 
specific  leakage  conditions  for  the  scenario  being  analyzed. 

The  development  of  those  single  split  fractions  is  discussed  in 
Section  10.0. 

Top  Events  BA,  BB 

The  tenth  and  eleventh  top  events  in  the  KPU  event  tree  are  BA, 
and  BB.  These  events  represent  the  consequential  failure  of 
either  HPU  due  to  a fuel  leak  in  one  of  the  HPUs  (the  leak  can 
be  in  either  HPU,  the  specific  condition  depending  entirely  on 
the  particular  event  tree  scenario  being  analyzed) . 
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No  fault  tree  was  constructed  for  this  event  since  the  requisite 
split  fractions  are  simply  single  numbers  that  depend  only  on 
the  specific  leakage  conditions  for  the  scenario  being  analyzed. 
The  development  of  those  single  split  fractions  is  discussed  in 
Section  10.0. 

Top  Event  BH 

The  twelfth  top  event  in  the  HPU  event  tree  is  BH.  This  event 
represents  a correction  factor  to  distinguish  between  failures 
occurring  before  and  after  lift-off.  The  prior  events  in  the 
event  tree  account  for  all  run  failures,  regardless  of  the 
time  at  which  they  occur  while  the  HFUs  are  running.  Failures 
occurring  before  lift-off  ordinarily  result  in  launch  scrub, 
while  failures  occurring  afterward  can  result  in  either  LOC/V 
or  success,  depending  on  their  severity. 

The  fault  trees  developed  for  BH  are  shows  in  Appendices  C9.5-13 
and  C9.5-14.  Two  trees  are  shown:  one  labeled  BHO  applies  only 
to  the  first  node  for  the  BH  event  in  the  event  tree;  the  other, 
labeled  BH,  applies  to  all  other  nodes.  The  BHO  fault  tree 
accounts  for  all  start  failures  which  are  not  otherwise  taken  into 
account  in  the  fault  trees  developed  for  all  other  top  events  in 
the  event  tree.  Start  failures,  of  course,  all  occur  before  lift- 
off and  are,  therefore,  all  prelaunch  failures  that  ordinarily 
lead  to  launch  scrub.  Such  failures  should  not  be  considered 
elsewhere  in  the  event  tree  logic.  The  BHn  fault  tree  accounts 
for  the  start  failures  and  the  proportion  of  run  time  that 
constitutes  the  pre-lift-off  period.  This  is  a simple  time  ratio 
prelaunch  run  time  to  the  total  HPU  run  time.  The  prelaunch  run 
time  is  30  seconds,  while  the  post-launch  HPU  run  time  is  2.1 
minutes,  yielding  a ratio  of  R ■ 0.5/2. 6 for  scenarios  in  which 
one  HPU  has  failed.  The  ratio  becomes  2R  - R2  for  scenarios  in 
which  two  HPUs  have  failed.  The  numerical  result  computed  from 
fault  tree  BH  directly  yields  the  requisite  split  fraction  for 
top  event  BH  in  the  event  tree. 


9.6  SPATIAL  INTERACTIVE  EVENTS  (SIEs) 

An  SIE  is  a cascading  failure  within  one  system  that  results  from 
an  initiating  failure  or  condition  in  another  system.  To  be  an 
SIE,  a consequential  failure  must  also  be  initiated  by  means  of 
a physical  interactive  mechanism  such  as  hot  gas  or  shrapnel  that 
results  from  failure  of  or  degraded  operation  of  the  system. 

Thus,  a detonation  of  fuel  in  an  HPU  Gas  Generator  Valve  Module 
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(GGVM)  because  of  an  exhaust  leak  in  another  HFU  is  a spatial 
interaction  event,  whereas  loss  of  an  HPU  because  of  a secondary 
fuel  valve  failing  in  the  closed  position  in  the  GGVM  is  not. 

The  split  fraction  representing  an  SIE  is  modeled  as  a conditional 
probability  distribution  as  described  in  Section  5.5.  The  SIE 
split  fractions  discussed  in  this  analysis  are  a subset  of  the  set 
of  all  split  fractions  defined  by  the  node  points  on  the  HPU  event 

trees . 

Three  types  of  SIEs  have  been  identified  as  significant  for  this 
PRA.  They  are  (1)  events  related  to  HPU  turbine  breakup,  (2) 
events  related  to  HPU  fuel  (hydrazine)  leakage,  and  (3)  events 
related  to  hot  exhaust  gas  leakage.  The  three  categories  of  SIEs 
are  discussed  in  the  paragraphs  below. 


9.6.1  Events  Related  to  hpu  Turbine  Br??kaB 

The  SIEs  that  result  from  HPU  turbine  breakup  are  identical  in 
nature  to  those  for  the  APU  discussed  in  Section  6.6.1.  There 
are,  however,  significant  differences  between  the  HPU  and  APU 
design  and  their  operating  environment  that  affect  the  SIE 
conditional  probabilities.  The  frequency  of  SIEs  initiated  by 
HPU  turbine  fragments  are  described  by  conditional  probability 
distributions  defined  in  Section  10.5.1.  The  differences  which 
affect  these  probabilities  are  discussed  below. 

Conditional  probabilities  related  to  HPU  turbine  breakup  are 
affected  by  two  design  differences.  First,  the  fuel  control 
valves  in  the  HPU  Gas  Generator  Valve  Module  (GGVM)  are  different 
from  those  in  the  APU.  The  valves  in  the  HPU  are  considered  less 
lijcely  to  fail  open  and  thus  cause  an  HPU  overspeed.  Secondly, 
the  HPU  containment  ring  is  26%  larger  than  the  APU  ring.  This 
means  that  there  is  a much  lower  probability  of  uncontained 
fragments.  It  also  means  that  any  fragments  that  are  uncontained 
may  be  at  a lower  energy  level  and  hence  less  likely  to  damage 
other  equipment. 

The  probability  that  an  HPU  turbine  will  break  up  at  normal  speed 
is  significantly  lower  than  that  for  an  APU  because,  the  HPU  only 
runs  160  seconds  during  a mission,  is  not  required  to  restart  in 
flight  after  liftoff,  and  is  disassembled,  inspected,  and 
refurbished  after  each  mission. 

The  probability  that  an  item  of  flight  critical  equipment  will  be 
struck  by  a turbine  fragment  is  lower  for  the  HPU  than  for  the 
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APU  for  two  reasons.  There  are  fewer  pieces  of  flight  critical 
equipment  in  the  Solid  Rocket  Booster  (SRB)  aft  section  than  in 
the  Orbiter  compartment,  and  the  location  and  orientation  of  the 
HPUs  are  such  that  turbine  fragments  from  an  HPU  cannot  directly 
impact  a second  HPU,  its  fuel  line,  or  its  fuel  tank. 

The  location  and  orientation  of  the  HPUs  also  preclude  a turbine 
fragment  from  directly  striking  the  external  tank. 


9.6.2  Events  Related  to  HPU  Fuel  Leakage 

The  SIEs  that  result  from  HPU  fuel  leakage  are  those  in  which 
leakage  of  HPU  fuel  leads  to  deunage  of  flight  critical  equipment 
or  an  HPU.  This  section  presents  information  that  is  relevant 
to  the  establishment  of  split  fractions  for  the  associated 
conditional  probabilities  to  be  input  to  the  Probabilistic  Risk 
Analysis  (PRA)  model.  The  frequency  of  SIEs  associated  with  HPU 
fuel  leakage  is  reflected  in  conditional  probabilities  defined  in 
Section  10.5. 

Leaking  HPU  fuel  (hydrazine)  can  damage  equipment  by  means  of 
corrosion,  fire,  or  detonation. 


9. 6. 2.1  Corrosion  Damage  Resulting  from  HPU  Fuel  Leakage 

Hydrazine  can  dissolve  Kapton  used  for  wire  insulation  in  the  SRB 
aft  compartment.  However  the  160  seconds  maximum  possible 
exposure  of  the  Kapton  to  leaking  hydrazine  is  believed  to  be  too 
short  a period  for  serious  damage  to  occur. 


9. 6. 2. 2 Fire  Damage  Resulting  from  HPU  Fuel  Leakage 

Prior  to  HPU  activation  the  SRB  aft  skirt  area  is  purged  with 
nitrogen  until  the  oxygen  level  is  reduced  to  less  than  4 percent 
by  volume  (Reference  52) . A hydrogen  fire  is  not  possible  with 
so  little  oxygen.  A hydrazine  fire  is  also  unlikely  under  these 
conditions  (see  References  88  and  89  for  additional  information) . 
A flexible  barrier  separates  the  SRB  aft  skirt  area  from  the 
external  atmosphere.  This  skirt  prevents  the  oxygen  level  from 
increasing  to  an  unsafe  level  during  ascent. 


9. 6. 2. 3 Detonation  Damage  Resulting  from  HFU  Fuel  Leakage 

» 

Since  hydrazine  combustion  cannot  occur  in  the  atmosphere  of  the 
SRB  aft  skirt  area,  no  fire-induced  hydrazine  detonation  can 
result  from  KPU  fuel  leakage.  But  detonation  damage  resulting 
from  HFU  fuel  leakage  into  solenoid  cavities  of  the  fuel  isolation 
and  control  valves  is  still  a potential  problem  which  the  HPU 
shares  with  the  APU.  This  was  discussed  in  Section  6. 6. 2. 2. 


9.6.3  Events  Related  to  Hot  HPU  Exhaust  Gas  Leakage 

The  SIEs  that  result  from  hot  HPU  exhaust  gas  leakage  are  those 
in  which  hot  gas  leakage  damages  Flight  Critical  Equipment  (FCE) 
or  an  HPU.  This  section  presents  information  that  is  relevant 
to  the  establishment  of  split  fractions  for  the  associated 
conditional  probabilities  to  be  input  to  the  PRA  model.  Values 
assigned  to  the  split  fractions  are  discussed  in  Section  11.2. 


9. 6. 3.1  High  Pressure  Hot  HPU  Exhaust  Gas  Leakage 

SIEs  associated  with  high  pressure  hot  HPU  exhaust  gas  leakage 
are  identical  in  nature  to  those  discussed  in  Section  6. 6. 3.1. 
The  possibility  of  damage  from  this  event  is  considered  remote 
and  as  a simplifying  assumption  in  the  PRA,  the  probability  was 
considered  negligible. 


9. 6. 3. 2 Low  Pressure  Hot  HPU  Exhaust  Gas  Leakage 

Low  pressure  hot  HPU  exhaust  gas  leakage  is  a potential  problem 
common  with  the  APU. 


9 . 6 . 3 . 2 . 1 Solid  Rocket  Booster  (SRBT  aft  area  damage.  - The 
HPU  exhaust  consists  of  a mixture  of  N2,  H2,  and  NH3  gases  at  a 
temperature  which  varies  with  time  from  HPU  startup,  with 
positions  along  the  exhaust  duct,  with  altitude,  and  with  the 
rate  of  fuel  flow.  Ho  insulation  is  employed  to  protect  the 
HFUs,  the  hydrazine  fuel  lines,  or  the  hydraulic  lines  from 
exposure  to  hot  gas  leaking  from  the  uninsulated  HPU  exhaust 
ducts . 

Pressure  and  temperature  of  the  exhaust  are  highest  at  the  HPU 
end  of  the  exhaust  duct  mainly  because  of  drag.  Assuming  the 
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pressure  in  the  aft  .skirt  area  is  the  same  as  that  of  the 
environment  into  which  the  exhaust  duct  terminates,  then  the 
pressure  difference  between  the  exhaust  and  the  aft  skirt  area 
is  less  than  5.4  Pounds  per  Square  Inch  (psi) . The  maximum 
temperature  of  1035 *F  occurs  only  at  the  end  of  the  160  second 
HFU  run  time  when  the  SRB  is  at  hiqh  altitude. 

An  exhaust  leak  at  the  point  where  the  exhaust  duct  joins  the 
HFU  could  conceivably  damage  the  HFU  by  damaging  the  associated 
electrical  wiring  insulation.  The  wiring  insulation  is  Teflon 
with  Kapton  tape  wrapping,  which  is  destroyed  by  sustained 
exposure  to  temperatures  of  500 *F  or  above. 

The  HPUs  are  protected  by  obstructions  from  direct  exhaust  leak 
plume  impingement  unless  the  leak  occurs  immediately  at  the  KPU. 
The  SRB  nozzle  actuators,  the  hydraulic  lines,  and  the  HPU  fuel 
lines  are  protected  from  thermal  damage  by  the-  flow  of  a heat 
absorbing  fluid. 

Exhaust  leakage  from  an  HPU  may  impinge  upon  the  associated  Fuel 
Supply  Module  (FSM)  but  not  upon  the  FSM  of  the  other  HPU.  Thus 
an  explosion  of  the  FSM  due  to  hydrazine  detonation  cannot  occur 
since  since  the  associated  HPU  will  first  be  disabled  by  the 
internal  detonation  of  hydrazine  which  has  been  heated  to  the 
detonation  temperature  when  passing  through  the  HPU  itself. 


9. 6. 3. 2. 2 HPU  shutdown  due  to  hvdra2ine  detonation.  - An 
analysis  has  been  performed  to  gain  insight  regarding  conditions 
leading  to  hydrazine  detonation  given  HPU  exhaust  leak  impinge- 
ment upon  a FSM  (Reference  83)  . 

Gas  flowing  through  the  HPU  exhaust  duct  is  treated  as  being  an 
ideal  gas  flowing  with  constant  friction  in  an  adiabatic  manner. 
Hot  gas  leakage  is  assumed  to  occur  at  a location  nearest  the  FSM 
which  would  place  the  leak  24  inches  from  the  exhaust  outlet  of 
the  2 inch  Inside  Diameter  (ID)  duct.  Data  from  Reference  33 
suggests  a friction  factor  of  0.58  would  be  appropriate  for  the 
exhaust  duct.  This  value  reflects  a degree  of  roughness  of  the 
exhaust  duct  which  allows  the  use  of  a constant  friction  factor 
over  a range  of  Reynolds  numbers  resulting  from  a large  variation 
in  mass  flow  rate  and  temperature.  Temperature  of  gas  flowing 
past  the  leak  point  will  vary  roughly  linearly  as  a function  of 
time  between  78 *F  and  1006 *F  during  the  160  second  run  of  the 
HPU. 
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Reference  33  indicates  that  the  maximum  design  inlet  temperature 
of  hydrazine  to  the  AFU  is  150  *F  and  the  maximum  operating 
temperature  of  the  GGVM  is  200*F.  Above  200*F  hydrazine  is  known 
to  form  bubbles  which  in  principle  could  lead  to  detonation,  if 
adiabatic  compression  causes  a local  temperature  in  excess  of  the 
autodecomposition  temperature  of  about  445 *F.  Experiment  will  be 
required  to  determine  the  actual  maximum  allowed  temperature  and 
whether  this  maximum  temperature  can  result  from  adiabatic  com- 
pression of  bubbles  or  from  the  heating  of  hydrazine  when  passing 
through  the  gas  generator  injection  tube.  In  any  case,  one  may 
conclude  that  the  temperature  of  the  hydrazine  increases  by  50 *F 
between  the  inlet  into  the  APU  and  the  exit  from  the  GGVM.  Thus 
the  maximum  allowed  FSM  temperature  must  lie  within  the  wide 
temperature  range  of  150  *F  to  395 *F. 

The  minimum  distance  between  the  FSM  and  the  HFU  exhaust  duct  is 
about  3.75  inches.  At  high  altitude  the  leaking  exhaust  gas 
will  lose  energy  by  expanding  and  propagating  a shock  wave  into 
.the  compartment  atmosphere.  The  15  inch  diameter  FSM  occupies  25 
percent  of  the  solid  angle  in  the  hemisphere  defined  by  the 
nearest  leak  location  and  a line  connecting  the  center  of  the 
FSM.  This  is  suggestive  of  a low  efficiency  of  thermal  transfer 
between  the  leaking  exhaust  jet  and  the  FSM. 

Clearly,  small  HPU  exhaust  duct  leaks  (which  are  expected  to  be 
far  more  common  than  larger  leaks)  will  not  lead  to  a loss  of  an 
HFU.  The  most  extreme  leak  — diverting  all  of  the  HPU  exhaust 
flow  would  still  need  to  transfer  a significant  fraction  of  its 
thermal  energy  to  the  FSM  in  order  to  result  in  loss  of  the  HFU. 

Since  the  FSM  has  been  covered  with  a foam  insulation  to  a depth 
of  1.25  inches.  The  insulation  is,  in  turn,  surfaced  with 
aluminized  tape,  even  the  largest  exhaust  leak  is  not  expected  to 
lead  to  loss  of  the  associated  HFU  by  means  of  hydrazine 
detonation. 

The  possibility  of  damage  resulting  from  hot  HFU  exhaust  gas 
leakage  is  considered  to  be  remote.  The  probability  of  such 
damage  was  considered  negligibly  small. 
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10.0 


HPU  DATA  DEVELOPMENT 


This  section  describes  the  process  used  to  develop  probability 
distributions  for  HPU  component  failure  rates.  Probability 
distributions  are  used  in  this  context  to  reflect  the  fact  that 
component  failure  rates  are  uncertain.  The  use  of  probability 
distributions  provides  a complete  description  of  our  state  of 
knowledge  about  the  failure  rates  of  the  equipment  in  question, 
including  any  sources  of  variability  among  similar  components. 

By  contrast,  use  of  a point  estimate  would  imply  a degree  of 
exactness  that  is  not  justified  by  the  data. 

It  is  important  to  bear  in  mind  that  the  existence  of  uncertainty 
about  component  failure  rates  does  not  imply  that  the  results  are 
inaccurate  or  that  they  reflect  a state  of  ignorance  on  the  part 
of  the  analysts.  Rather,  uncertainty  arises  from  a number  of 
sources : 


a.  The  relatively  small  amount  of  data  that  is  available 
on  many  components 

b.  The  possibility  of  missing  data  (e.g.,  failures  that 
are  not  captured  by  the  data  collection  process) 

c.  Decisions  about  whether  incipient  failures  should  be 
included  in  the  data  analysis 

d.  Estimation  of  the  applicable  exposure  data  (e.g.,  the 
total  number  of  hours  that  a component  operated) 

e.  The  application  of  data  from  one  situation  (e.g., 
checkout)  to  other  situations  such  as  actual  flights 

f • The  assumption  that  failure  rates  are  constant  over  time 

g.  Differences  in  component  reliability  from  one  mission 
to  another  (e.g.,  due  to  differences  in  the  quality  of 
re  f urb ishment ) 

h.  Differences  in  component  reliability  from  one  HPU  to 
another,  or  between  similar  components  in  the  same  HPU 

i.  The  extrapolation  of  failure  rate  estimates  developed 
for  other  applications  (e.g.,  aircraft)  to  the  space 
shuttle 
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j . The  environmental  factors  that  should  be  used  in 

adjusting  failure  rate  estimates  from  one  application 
to  another 

The  approach  used  in  this  study  to  describe  and  quantify  such 
uncertainties  is  the  Bayesian  theory  of  probability.  In  this 
approach,  each  basic  event  frequency  is  described  by  a 
probability  distribution  specifying  the  various  possible  values 
for  that  frequency  and  how  likely  each  value  is.  The  Bayesian 
approach  is  capable  of  talcing  into  account  both  engineering 
judgment  about  the  event  frequency,  and  also  empirical  data 
such  as  the  actual  number  of  failures  and  operating  hours 
accrued  to  date  for  the  HPU. 

In  particular,  a prior  probability  distribution  is  specified  to 
reflect  all  the  available  information  on  similar  components  in 
other  applications,  as  tempered  by  the  engineering  judgment  of 
the  analysis  team.  This  distribution  is  generally  then  updated 
with  the  observed  HPU  data  to  yield  a revised  (i.e.,  posterior) 
distribution.  In  other  cases,  the  posterior  distribution  is  simply 
set  equal  to  the  prior  distribution,  and  no  update  is  performed. 
This  is  done  in  cases  where  little  or  no  HPU  data  is  available 
for  use  in  the  update;  e.g.,  in  modeling  hourly  failure  rates  for 
failures  that  have  not  occurred  to  date. 

The  use  of  judgment  is  in  keeping  with  the  Bayesian  theory  of 
probability.  In  particular,  the  judgment  cf  an  analysis  team 
that  is  knowledgeable  about  equipment  reliability  is  a valid 
form  of  evidence  for  use  in  formulating  distributions;  experience 
has  shown  that  the  judgment  of  experienced  analysts  is  often 
remarkably  close  to  actual  data  when  the  two  have  been  compared. 

For  example,  several  studies  of  component  reliability  have  found 
expert  estimates  of  component  failure  rates  to  be  typically  within 
a factor  of  2 to  4 from  the  observed  failure  rates. 

Section  10.1  describes  the  raw  data  sources  from  which  HPU  failure 
data  was  obtained.  These  sources  include  such  documents  as 
MSFC  Problem  Assessment  System  reports,  anomaly  reports,  and  so 
on.  For  most  spatial  interaction  events  (SIE) , virtually  no 
empirical  data  was  available.  Therefore,  judgmental 
distributions  were  developed  for  the  frequencies  of  these  events 
(e.g. , the  likelihood  of  damaging  an  HPU  as  the  result  of  a 
turbine  over speed) . 

The  process  used  for  developing  SIE  distributions  and  the  resulting 
judgmental  distributions  are  described  in  Section  10.5.  These 
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distributions  were  based  on  extensive  knowledge  of  such  events, 
and  also  on  a number  of  analytical  studies  performed  specifically 
in  support  of  this  PRA. 

Section  10.3  describes  the  categories  of  component  failures  for 
which  data  was  collected,  and  the  guidelines  and  criteria  that 
were  used  for  determining  which  events  (e.g.,  incipient  failures) 
would  be  considered  failures  in  this  study. 

In  general,  the  criteria  specified  in  Section  10.3  are  fairly 
conservative.  For  example,  checkout  data  was  included  in  the 
database  on  exactly  the  same  basis  as  flight  data.  Despite  this, 
however,  no  HPU  failures  were  identified  in  the  flight  and 
checkout  data  reviewed  in  this  study.  Finally,  Section  10.4 
presents  the  actual  prior  and  posterior  distributions  that  were 
developed  for  the  categories  of  components  specified  in  Section 
10.3.  The  sources  of  data  used  to  generate  and  update  the 
distributions  for  the  various  failure  rates  are  also  indicated. 

The  Bayesian  analysis  that  was  used  to  develop  the  posterior 
distributions  automatically  determines  the  appropriate  weights 
to  assign  to  the  observed  data  and  the  prior  distribution,  based 
on  the  relative  strength  of  the  two  types  of  evidence  in  each 
particular  situation.  For  example,  if  the  prior  distribution 
is  extremely  broad  (reflecting  a high  degree  of  uncertainty  on 
the  part  of  the  analysis  team)  and  there  is  a moderate  amount 
of  empirical  data  available,  then  the  data  will  tend  to  dominate 
the  posterior  distribution.  By  contrast,  if  there  is  very  little 
empirical  data  available,  then  the  posterior  distribution  will 
tend  to  look  similar  to  the  prior  distribution. 

Due  to  the  high  reliability  of  the  components  and  the  extremely 
limited  amounts  of  flight  and  hot  firing  time  accrued  to  date,  no 
flight  or  checkout  failures  were  identified.  The  distributions 
for  the  various  demand  failure  rates  were  updated  based  on  the 
observed  data  of  zero  failures  in  the  total  number  of  demands 
to  date,  using  exactly  the  same  procedure  as  would  be  used  if 
failures  had  occurred.  The  posterior  distribution  resulting  from 
this  process  tends  to  be  somewhat  lower  than  the  prior,  especially 
when  the  prior  distribution  extends  to  very  high  failure  rates, 
which  are  inconsistent  with  the  observation  of  zero  failures. 

Because  of  the  very  limited  amount  of  flight  and  checkout 
operating  time  accrued  to  date,  it  was  clear  that  the  posterior 
distributions  for  hourly  failure  rates  would  look  virtually 
identical  to  the  priors.  Because  of  this,  no  updates  were 
performed  for  the  hourly  failure  rates,  but  a great  deal  of 


10-3 


effort  was  devoted  to  the  development  of  the  prior  distributions 
for  these  failure  rates.  In  particular,  available  information 
from  many  different  sources  of  reliability  data  (e.g.,  the  Non- 
electronic Parts  Reliability  Data  handbook  prepared  by  the  Rome 
Air  Development  Center)  was  used  to  guide  the  engineering  judgment 
of  the  analysis  team. 


10.1  HPU  RAW  DATA  SOURCES 

The  accuracy  of  any  technical  study  or  report  is  dependent  on 
the  accuracy,  quality,  and  availability  of  the  input  data.  It 
was  recognized  prior  to  the  start  of  this  study  that  collection 
and  validation  of  the  HPU  data  would  be  important  to  the  quality 
and  accuracy  of  the  final  results.  Particular  attention  was 
given  to  the  use  of  engineering  judgment  in  the  data  development 
process,  especially  in  light  of  the  limited  amount  of  HPU 
operating  experience  accumulated  to  date. 

Two  basic  types  of  data  were  required:  (a)  exposure  data 

indicating  how  long  the  various  HPU  components  had  operated; 
and  (b)  data  indicating  how  many  failures  each  given  component 
had  experienced  over  the  exposure  period.  For  those  components 
that  did  experience  failures  information  would  also  be  needed 
on  the  failure  modes  that  were  observed. 

It  was  judged  that  utilizing  Qualification  Test  (Qual)  data  would 
not  produce  reasonable  failure  rates.  The  failures  associated 
with  the  Qual  test  program  phase  would  likely  represent  flaws  in 
the  early  design  or  manufacturing  process.  These  failures  would 
not  necessarily  be  indicative  of  the  final  flight  or  production 
components  or  of  later  refinements  in  the  manufacturing  process. 

The  Acceptance  Test  (ATP)  phase  is  the  next  level  of  component 
development  for  which  data  was  known  to  be  available.  This  data 
was  considered  to  be  of  value  in  tracking  failures  from  the  time 
of  contractor  component,  or  system  delivery,  to  end-of-life. 

However,  it  was  decided  to  exclude  the  ATP  data  from  the  analysis 
because  of:  (a)  the  lack  of  information  on  actual  design  changes 
resulting  from  ATP  failures,  (b)  the  inability  to  screen  out 
facility  failures  and  anomalies  caused  by  facility  or  test  setups, 
and  (c)  the  lack  of  time  and  funding  available  in  this  study  to 
ensure  that  the  failures  identified  in  the  ATP  data  were 
representative  of  actual  flight  configurations. 


10-4 


Launch  checkout  and  flight  data  were  selected  as  the  most 
meaningful  data  to  support  this  analysis.  This  data  represents 
the  HPU  system  in  the  flight  configuration  and  environment. 
Moreover,  it  was  judged  that  any  valid  failure  modes  identified 
in  Qual  or  Acceptance  tests,  and  not  corrected,  would  be 
reflected  in  flight  failure  rates,  thus  reducing  the  effect  of 
not  including  data  from  these  development  categories. 

Several  sources  of  launch  checkout  and  flight  data  were  found 
to  be  available  and  accessible  during  the  study  time  frame; 
these  sources  are  described  below.  These  sources  were  utilized 
to  develop  mission  time  histories  dating  from  1 January  1981 
through  Flight  #24.  Other  sources  such  as  MASA/contractor  test 
reports  and  discussions  with  knowledgeable  personnel  were  used 
as  an  information  base  to  assist  in  the  development  of 
probability  distributions  for  the  Spatial  Interactive  Events. 

The  information  from  all  sources  was  analyzed  using  a specific 
set  of  criteria  to  track  and  identify  legitimate  HPU  failures. 
These  criteria  are  discussed  in  Section  10.3. 

The  salient  information  needed  to  develop  flight  rates  and 
mission  sequences  was  compiled  as  a basis  for  developing  model 
input  data.  The  individual  data  sources  and  their  use  in  this 
study  are  discussed  below. 

a.  Marshall  Space  Flight  Center  (MSFC)  Problem  Assessment 
System,  Problem  Reports 

b.  Shuttle  Flight  Data  and  In-flight  Anomaly  List 

c.  JSC  Mission  Reports,  Missions  1 through  23 

d.  Study  and  test  reports  from  NASA  and  contractor  sources 
and  published  technical  documents 


10.1.1  MSFC  Problem  Assessment  System 

Each  problem  record  pertaining  to  the  SRB  Thrust  Vector  Control 
Subsystem  was  extracted  from  the  MSFC  Problem  Assessment  System 
database  and  screened  for  applicability  to  the  HPU.  Review  of 
this  data  determined  that  no  flight  or  Hot  fire  test  anomalies 
or  failures  were  experienced.  The  fact  that  the  HPU  experienced 
no  flight  or  hot  fire  test  failures  represents,  success,  data 
and  the  application  of  this  data  for  establishing  HPU  failure 
distributions  is  discussed  in  Section  10.3. 
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10.1.2  Shuttle  Flight  Pats  and  In-Flight  Anomaly  List 

The  Shuttle  Flight  Data  and  In-flight  Anomaly  List  is  a historical 
report  of  flight-related  information.  It  also  includes  in-flight 
anomalies  and  references  to  problems  encountered  during  the  STS 
missions . 

a.  Initial  altitude  and  inclination 

b.  Mission  sequence  number,  flight  and  orbiter  number 

c.  Solid  Rocket  Booster  (SRB)  Separation  (SEP)  time 

d.  Other  mission-related  data 

The  mission  related  portion  of  the  data  was  used  to  develop  a 
mission  timeline  database,  combining  similar  information  from 
contractor  furnished  HPU  run  times. 


10.1.3  JSC  Mission  Reports 

The  JSC  Mission  Reports  were  used  to  collect  mission-related 
data.  These  reports  were  also  used  as  a reference  when  mission 
information  obtained  from  other  data  sources  required  further 
clarification. 


10.1.4  study  Reports.  Test  Results.  & Personal  Communications 

Some  of  the  failure  modes  under  consideration  during  this  study 
have  a very  low  likelihood  of  occurrence.  Some  are  of  such  a 
nature  that  directly  applicable  test  data  does  not  exist;  e.g., 
some  catastrophic  SIEs.  In  order  to  estimate  these  likelihoods, 
information  from  a large  number  of  study  and  test  reports  from 
NASA  and  contractor  sources  and  other  technical  publications  was 
utilized. 

Quite  a lot  of  information  used  to  supplement  the  written  reports 
was  obtained  through  telecons  with  various  knowledgeable  people 
in  specialized  fields  JSC,  MSFC,  and  other  locations.  Tests  are 
presently  being  conducted  at  White  Sands  Proving  Grounds  on  the 
properties  of  Hydrazine  and  its  effect  on  certain  materials.  The 
results  were  not  available  for  consideration  and  application  for 
this  study. 


10-6 


10.2  SPATIAL  INTERACTIVE  EVENT  DATA 


Table  10.2-1  presents  the  HPU  SIE  split  fraction  distributions  in 
the  format  used  for  entry  into  the  PRA  model.  These  distributions 
and  the  information  supporting  their  development  are  discussed 
individually  in  Section  10.5  and  are  presented  here  for  clarity 
and  convenience. 


10.3  DATA  CATEGORIZATION 

A number  of  guidelines  and  criteria  were  established  for  the  HPU 

data  categorization  task.  These  are  each  discussed  below. 

a.  Failures  occurring  before  January  l,  1981,  were  omitted 
from  the  database  on  the  grounds  that  the  HPU  was  still 
undergoing  design  development  prior  to  that  time. 

b.  Failures  occurring  during  qualification  tests  (QUAL) , 
acceptance  tests  (ATP),  helium  leak  tests,  or  HPU  assembly, 
and  refurbishment  were  not  included  in  the  database  for 
this  project.  These  tests  were  thought  to  be  largely 
inapplicable,  on  the  basis  that  bench  tests  of  individual 
components  or  sub-assemblies  do  not  reflect  the  actual 
operation  of  a completed  HPU.  In  addition,  since  these 
tests  are  often  performed  early  in  the  process  of  readying 
an  HPU  for  flight,  they  detect  many  types  of  failures  that 
would  not  be  expected  during  an  actual  flight. 

c.  Both  checkout  tests  (CKO)  and  actual  flights  (FLT)  were 
considered  relevant  for  inclusion  in  the  database. 

However,  no  applicable  flight  or  checkout  failures  were 
identified. 

d.  Incipient  failures  (e.g.,  turbine  blade  cracking)  were  not 
explicitly  included  in  the  database  as  actual  failures. 
However,  the  history  of  incipient  failures  to  date  was  taken 
into  account  qualitatively  in  establishing  appropriate  prior 
distributions. 
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TABLE  10.2-1 

SPATIAL  INTERACTIVE  EVENT  HPU  ASCENT  DISTRIBUTIONS 
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e.  Failures  of  components  that  are  outside  the  scope  of  our  HFU 
model  were  excluded  from  the  data  base  for  obvious  reasons. 
For  example,  an  HPU  failure  due  to  an  erroneous  signal  from  a 
bite  circuit  was  excluded  from  consideration  for  this  reason. 

f.  Data  for  components  that  are  significantly  different  in 
design  and/or  operation  was  not  grouped.  For  example,  the 
number  of  demands  experienced  by  the  isolation  valve  was 
analyzed  separately  from  the  number  of  demands  for  the  gas 
generator  valves,  since  the  gas  generator  valves  experience 
pulsing  operation  and  might  therefore  have  a different 
failure  rate.  Analyzing  such  components  together  might  have 
resulted  in  the  use  of  inapplicable  data  for  a particular 
component . 

Based  on  the  guidelines  and  criteria  established  above,  distri- 
butions were  developed  for  the  frequencies  of  various  types  of 
components  and  component  failure  modes.  The  components  used  for 
the  HPU  model  are  specified  in  Table  10.3-1. 


10.4  FAILURE  RATES 

Once  the  data  has  been  categorized,  as  a basis  for  determining 
the  components  and  failure  modes  for  which  failure  rate 
distributions  will  be  needed,  the  next  step  is  to  specify  prior 
distributions  for  those  failure  rates.  After  that,  one  must 
specify  the  relevant  data  for  each  component  failure  mode?  i.e., 
the  number  pf  observed  HPU  component  failures,  and  the  number 
of  operating  hours  and/or  demands  to  which  each  component  was 
subject.  Finally,  the  data  must  be  combined  with  the  prior 
distributions  to  yield  posterior  distributions.  The  results  of 
these  three  steps  are  presented  in  the  sections  below. 


10.4.1  Development  of  Prior  Distributions 

A number  of  sources  were  used  as  background  information  in 
developing  prior  distributions.  These  include  the  Nonelectronic 
Parts  Reliability  Data  (NPRD)  handbook,  prepared  by  the  Rome  Air 
Development  Center;  MIL-HDBK-217D  (which  was  used  particularly 
for  electronic  components) ; the  Reliability  Engineering  Data 
Series  report  on  Failure  Mechanisms,  prepared  by  the  Avco 
Corporation?  NASA  operating  life  limits  for  the  APU;  and  the 
engineering  judgment  of  the  analysis  team  (based  on  previous 
risk  assessments  and  data  analyses) . 
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In  many  cases,  adjustments  to  the  information  obtained  from 
these  sources  were  needed.  For  example,  many  of  the  failure 
’ rate  estimates  obtained  from  NPRD  were  for  components  in 
aircraft  or  ground-based  environments  rather  than  missile 
environments . 

Environmental  adjustment  factors  were  judged  to  be  a reasonable 
way  to  account  for  many  of  these  differences;  factors  for  this 
purpose  were  obtained  from  the  Avco  Failure  Mechanisms  report. 

In  addition,  all  the  failure  rate  estimates  in  NPRD  are  presented 
on  a per-hour  basis  (H) , while  many  of  the  failure  rates  for  the 
HPU  risk  study  were  needed  on  a per-demand  basis  (D) . In  such 
cases,  the  number  of  demands  per  hour  in  a typical  application 
was  estimated  as  a basis  for  converting  the  failure  rate  to  the 
desired  units. 

In  a few  cases,  estimates  were  not  available  from  sources  such 
as  NPRD  or  MIL-HDBK-217D.  In  such  cases,  observed  APU  failure 
experience  was  to  be  used  in  the  development  of  the  HPU  priors, 
since  no  HPU  failures  were  available  to  aid  in  quantification. 

Finally,  after  the  initial  assessment  of  prior  distributions, 
the  distributions  for  similar  components  or  related  failure 
modes  were  compared  with  each  other  as  a reasonableness  check. 

For  example,  the  failure  rates  for  different  types  of  rotating 
equipment  (e.g.,  the  turbine,  pumps,  and  gearbox)  were  compared 
to  assure  that  they  were  roughly  comparable,  and  that  the  assigned 
failure  rates  were  consistent  with  engineering  knowledge,  such  as 
differing  speeds  at  which  the  various  types  of  equipment  operate. 

This  type  of  comparison  was  performed  to  assure  that  the  various 
failure  rates  reflected  the  correct  relative  ranking.  The 
comparison  process,  which  was  especially  important  since  many  of 
the  prior  distributions  were  based  on  different  data  sources  and/ 
or  different  applications,  did  result  in  the  adjustment  of  several 
distributions  to  correspond  more  closely  with  what  the  analysis 
team  considered  realistic  for  application  to  the  space  shuttle. 

The  process  described  above  is  the  same  process  as  was  used  to 
develop  prior  distributions  for  the  APU.  Consideration  was  given 
to  adjusting  these  distributions  to  reflect  the  more  extensive 
testing  and  refurbishment  performed  on  the  HPU.  This  testing 
includes  the  following  steps: 

a . Sundstrand  bench  tests  ( i . e . , acceptance  tests ) 

b.  Inspection  6 checkout  on  receipt  at  Kennedy  Space  Center 
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c.  Helium  leak  testing 

d.  GN2  spin  test  of  the  HFU  turbine 

e.  Bite  tests  before  hot  firing 

f.  Hot  firing  of  the  HPU 

g.  Post-flight  disassembly,  refurbishment,  and  testing 

The  extent  of  post-flight  disassembly  and  refurbishment  in 
particular  are  significant  additions  to  the  testing  and 
refurbishment  that  are  performed  on  the  APU , and  might  thus  be 
expected  to  result  in  lower  HPU  flight  failure  rates  than  for 
the  APU.  However,  these  lower  failure  rates  are  counteracted 
by  the  harsher  environments  experienced  by  the  HPU  — in 
particular,  the  immersion  of  the  HPU  in  salt  water  after  each 
mission.  It  was  judged  that  the  competing  effects  of  increased 
testing  and  a harsher  environment  roughly  canceled  each  other 
out,  and  that  the  HPU  prior  distributions  were  within  the  range 
of  uncertainty  of  the  APU  priors. 

Table  10.4-1  presents  the  prior  distributions  that  resulted 
from  this  process.  For  each  distribution,  the  table  specifies 
the  category  of  components  to  which  the  distribution  applies, 
the  relevant  failure  mode  or  modes,  the  5th  and  95th  percentiles 
of  the  prior  distribution;  and  the  sources  used  in  developing 
that  prior.  (Engineering  judgment  is  nearly  always  used  in  the 
development  of  distributions,  because  there  is  rarely  enough 
data  to  unambiguously  specify  a distribution. ) Virtually  all  the 
prior  distributions  were  assumed  to  be  lognormal  in  form,  as  is 
common  practice  in  PRAs.  For  these  distributions,  the  medians  can 
be  found  as  the  geometric  mean  of  their  5th  and  95th  percentiles. 
The  only  exception  to  the  assumption  of  lognormality  is  the 
conditional  frequency  of  leaks  in  the  fuel  systems  of  additional 
HPUs,  given  that  one  HPU  is  leaking.  Because  the  95th  percentile 
of  this  frequency  was  quite  high,  a lognormal  distribution  would 
not  have  been  reasonable;  in  particular,  it  would  have  allowed 
conditional  probabilities  of  leak  significantly  greater  than  1.0. 
Therefore,  a beta  distribution  was  used  for  this  parameter  instead 
of  a lognormal  distribution. 
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10.4.2  Specification  of  Failure  Data 


Once  prior  distributions  has  been  developed  for  each  category  of 
components  and  each  failure  mode,  the  next  step  is  to  specify 
the  relevant  data  for  each  category;  i.e.,  the  number  of  observed 
component  failures  of  each  type,  and  the  number  of  operating 
hours  (H)  and/or  demands  (D)  to  which  each  component  was  subject, 
which  can  be  referred  to  as  exposure  data. 

No  actual  component  failures  were  identified  for  the  KPU  during 
flight.  The  estimation  of  exposure  data  requires  determination 
of  whether  the  relevant  failure  mode  is  likely  to  occur  over  time 
or  on  a per-demand  basis,  and  whether  a failure  would  likely  be 
detected  if  one  occurred. 

The  total  amount  of  run  time  accumulated  on  all  HPUs  to  date 
during  flights  and  hot  firings  is  only  about  23  hours  - too  small 
to  make  a difference  in  the  failure  rate  estimates  used  in  this 
study,  which  are  mostly  less  than  10”3.  Therefore,  updates  were 
not  performed  for  hourly  failure  rates. 

For  demand-based  failures,  the  number  of  demands  experienced  by 
a typical  component  during  flights  and  hot  firings  was  calculated 
to  be  603,  due  to  the  large  number  of  hot  firing  tests  performed 
on  the  HPU.  This  total,  assumes  that  the  component  in  question 
experiences  exactly  one  demand  during  each  firing  of  an  HPU. 

Care  must  be  taken  in  attributing  exposure  data  to  particular 
components,  however.  For  example,  failures  of  the  normal  speed 
logic  gate  may  not  be  detected  unless  a change  to  high  speed  is 
required  during  a mission,  and  thus  the  relevant  exposure  time 
for  this  particular  failure  is  likely  to  be  zero. 

Table  10.4-2  presents  the  prior  distribution  and  the  failure  and 
exposure  data  for  each  basic  event  included  in  the  analysis.  As 
can  be  seen  from  that  table,  the  prior  distributions  for  hourly 
failure  rates  were  not  updated  and  were  used  directly  as  posterior 
distributions,  because  of  the  small  amount  of  exposure  time  and 
lack  of  failures  for  those  events. 
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10.4.3  Development  of  Posterior  Distributions 


The  Bayesian  updating  process  for  demand  failure  rates  was 
performed  using  the  RISKMAN  4 computer  software  on  an  IBM  personal 
computer.  The  resulting  distributions  for  demand  failure  rates,  as 
well  as  the  distributions  for  hourly  failure  rates,  are  shown  in 
Table  10.4-3.  This  table  shows  the  mean  frequency  for  each  basic 
event,  and  also  the  5th,  50th  and  95th  percentiles. 

The  Bayesian  analysis  used  to  develop  the  demand-based  distribu- 
tions shown  in  Table  10.4-3  automatically  assigns  the  appropriate 
weights  to  the  observed  data  and  the  prior  distribution, 
respectively,  based  on  the  relative  strength  of  the  two  types  of 
evidence  in  each  particular  situation.  For  example,  when  a great 
deal  of  empirical  data  is  available,  then  the  data  will  tend  to 
dominate  the  posterior.  Similarly,  when  relatively  little 
empirical  data  is  available,  then  the  posterior  distribution  will 
tend  to  resemble  the  prior?  in  this  case,  the  data  is  simply  not 
strong  enough  to  override  the  information  contained  in  the  prior. 

For  of  the  basic  events  shown  in  Table  10.4-3,  no  failures  were 
observed,  so  the  posteriors  are  slightly  lower  than  the  priors. 

This  is  a result  of  the  Bayesian  inference  process,  and  is  also 
intuitively  reasonable.  This  effect  is  greatest  when  the  prior 
distribution  extends  to  include  fairly  high  failure  rates,  which 
are  inconsistent  with  the  lack  of  observed  failures.  The  fre- 
quencies of  a few  basic  events  were  described  by  point  estimates 
instead  of  distributions,  usually  on  the  basis  that  their 
frequencies  were  negligible.  For  the  purpose  of  this  study  these 
events  were  assigned  frequencies  of  zero.  The  events  in  this 
category  included  the  following: 

a.  A number  of  start-up  failures,  which  were  considered 

extremely  unlikely:  GN2  leakage  into  the  fuel  tank  at 

start;  failure  of  the  gas  generator  at  start;  plugging  of 
the  inline  fuel  filter  and  the  fuel  pump  filter  at  start; 
and  inadvertent  opening  of  the  fuel  pump  relief  valve. 

b.  Common  cause  failure  of  two  or  more  HFUs  due  to  a cause 
other  than  lube  oil  blockage.  The  frequency  of  other 
common  cause  failure  modes  was  considered  to  be  dominated 
by  the  frequency  of  lube  oil  plugging. 
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c.  Common  cause  failure  of  both  GGVM  valves  in  the  open 

position.  This  is  considered  much  less  likely  than  indepen- 
dent failure  of  both  valves  due  to  mechanical  and/or  control 
problems , because  one  of  the  valves  fails  in  the  open 
position  upon  loss  of  power  and  the  other  one  fails  closed. 
The  detached  valve  seat  single  point  failure  is  likewise 
considered  to  be  of  very  low  probability. 


10.5  HPU  SIE  DATA  DEVELOPMENT 

Based  on  the  discussion  of  Section  9.6,  two  types  of  SIEs  are 
significant  for  the  HFUs  as  for  the  APUs,  namely: 

a.  Events  related  to  HPU  turbine  failure  and  fragmentation 

b.  Events  related  to  HPU  fuel  (hydrazine)  leakage 

The  approach  to  developing  the  Auxiliary  Power  Unit  (APU)  SIE  data 
for  input  into  the  Probability  Risk  Analysis  (PRA)  is  discussed  in 
Section  7.6  is  valid  for  HPU  SIE  data  also.  There  are  differences, 
however,  between  the  APU  and  HPU  operation,  design,  and  environment 
that  lead  to  differences  in  conditional  probabilities.  The  HPU 
starts  once  and  runs  for  160  seconds  during  ascent,  then  is  dis- 
assembled and  refurbished  after  recovery;  whereas,  the  APU  starts 
at  least  twice  per  mission  is  inspected  after  20  hours  run  time 
(approximately  14  missions) . The  HPU,  since  it  only  runs  during 
ascent  in  the  nitrogen  purge  environment,  is  not  subject  to  fuel 
(hydrazine.)  fires;  whereas,  the  APU,  as  seen  on  STS-9,  is  subject 
to  hydrazine  fires  during  descent  because  of  air  drawn  into  the 
aft  compartment.  The  HPU  housing  has  a 26%  larger  containment 
ring  than  the  APU.  Moreover,  there  is  significantly  less  flight 
critical  equipment  in  the  Solid  Rocket  Booster  (SRB)  aft  skirt 
area  than  in  the  Orbiter  aft  compartment. 

Table  10.5-1  presents  the  split  fractions  required  for  input  into 
the  HPU  PRA.  For  each  SIE  conditional  probability,  the  paragraphs 
below  discuss  the  probability  of  frequency  distribution  developed 
for  input  into  the  PRA. 


10.5.1  SIE  Data  Related  to  HPU  Turbine  Failure  and  Fragmentation 

The  following  paragraphs  present  the  probability  of  frequency 
distributions  developed  to  represent  the  conditional 
probabilities  related  to  HPU  turbine  breakup  and  discuss  the 
data  that  support  these  distributions. 
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10.5.1.1  Probability  of  Turbine  Failure  at  Normal  Speed 

The  discussion  of  the  probability  of  APU  turbine  failure  at 
normal  speed  in  Section  7.1  is  valid  also  for  the  HPU.  The 
analysis  included  the  the  fact  that  the  HPU  is  disassembled  and 
inspected  after  each  mission. 


10.5.1.2  Probability  of  Turbine  Failure  Due  to  Overspeed 

This  probability  is  equal  to  unity.  Since  there  is  no  overspeed 
shutoff  circuitry  on  the  HPU  to  limit  the  overspeed  peak  rate 
as  on  the  APU,  any  condition  that  causes  overspeed  will  cause 
turbine  breakup. 


Table  10.5-1  HP?  Split  Fractions 


Name 

Split  Fractions 

FI 

Pr  (HPU  Turbine  Fail  | Primary  and  Secondary  Valves 
Fail  Open) 

F3 

Pr  {Uncontained  Shrapnel  | Turbine  Breakup  Due  to 
Overspeed } 

F3N 

Pr  {Uncontained  Shrapnel  | Turbine  Breakup  at  Normal 
Speed } 

F5 

Pr  {Failure  of  Second  HPU  or  FCE  | Uncontained 
Shrapnel } 

F7 

Pr  {Fuel  Leak  | Uncontained  Shrapnel  From  Second  HPU) 

F12 

Pr  {HPU  Fail  | Small  Leak  in  That  HPU} 

F13 

Pr  {HPU  fail  | Small  Leak  in  Another  HPU 
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10.5.1.3  Probability  of  Uncontained  HPU  shrapnel  as  a 
Consequence  of  Turbine  Breakup  at  Overspeed 

As  discussed  in  Section  7. 6. 1.3,  the  probability  of  having 
uncontained  fragments  as  a result  of  a turbine  failure  is 
determined  by  the  expected  breakup  speed  and  the  ability  of 
the  APU  structure  to  contain  fragments  at  the  expected  energy 
levels.  The  expected  turbine  failure  speed  of  108,000  RPM 
(150%)  presented  for  the  APU  is  valid  for  the  HPU  as  well. 

Reference  25  presents  calculations  to  estimate  APU/HPU  turbine 
overspeed  required  to  burst  the  containment  ring  and  produce 
shrapnel.  However,  the  calculations  in  this  reference  are  based 
on  the  OV101  APU/HPU  containment  ring  design.  Since  the  date 
of  this  reference,  the  HPU  containment  rings  were  redesigned  to 
increase  the  HPU  containment  ring  yield  speed  to  108,090  RPM 
(150%) . This  increased  the  volume  by  26%.  Thus,  the  likelihood 
of  HPU  fragments  being  uncontained  is  the  likelihood  that  the 
fragmentation  speed  will  exceed  108,000  RPM.  Since  the  expected 
fragmentation  speed  presented  is  108,000  RPM,  the  likelihood  of 
exceeding  the  HPU  containment  ring  yield  speed  is  50%. 

Allowing  for  uncertainty,  this  is  expressed  as: 


f4 


P(f4) 


.3  .3 


.2 

CM 

• 

_ 1 _ _ 

iiil 

0 .1  .2  .3  .< 

; 

i 

• « 

5 . < 

III! 

5 .7  .8  .9  1. 

Frequency  of  Occurrence 


P(f4) 

.2 

.3 

.3 

.2 

f 4 

.35 

.45 

.55 

.65 

This  distribution  was  used  in  the  evaluation  of  event  tree  top 
event  CH  following  occurrence  of  TH. 
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10.5.1.4  Probability  of  Uncontained  HFU  Shrapnel  as  a 

Consequence  of  Turbine  Failure  at  Normal  Speed 

The  information  presented  in  10.5.1.3  *is  also  valid  for  assessing 
the  effects  of  turbine  failure  at  normal  speed.  However,  even 
though  unit  S/N  105  broke  up  at  a speed  below  that  required  to 
burst  the  containment  ring,  fragments  bypassed  the  containment 
ring  and  exited  through  the  APU  housing.  This  was  attributed 
to  the  effects  of  notches  in  the  turbine  hub  (Reference  96)  . The 
group,  in  considering  this  failure,  judged  that  any  turbine  that 
broke  up  at  normal  speed  would  have  to  be  seriously  flawed  and, 
hence,  would  bypass  the  larger  containment  ring.  The  same 
discrete  distribution  presented  in  7.1.4  was  assigned  for  the 
HPU.  This  distribution  was  used  in  the  evaluation  of  event  tree 
top  event  CH  following  occurrence  of  PH. 


10.5.1.5  Probability  of  a Second  HPU  or  Flight  Critical 

Equipment  Failure  as  a Consequence  of  Uncontained 
Shrapnel  from  a Turbine  Failure  at  Overspeed 

Given  uncontained  shrapnel  from  a turbine  failure  at  overspeed, 
the  likelihood  that  this  shrapnel  would  cause  a second  HPU  or 
flight  critical  equipment  to  fail  is  determined  by  three  factors: 
the  energy  level  of  uncontained  shrapnel,  the  likelihood  of  an 
uncontained  fragment  striking  the  equipment,  and  the  vulner- 
ability of  the  equipment. 

Using  the  approach  of  Section  7.1.5,  the  energy  of  the  uncontained 
fragments  can  be  estimated  as  the  energy  of  the  turbine  hub  frag- 
ments minus  the  minimum  energy  required  to  burst  the  containment 
ring.  Reflecting  the  fact  that  the  HPU  containment  ring  is  26% 
larger  than  the  APU  containment  ring,  the  minimum  energy  required 
to  burst  the  containment  ring  is  calculated  to  be  24,048  lb-ft. 

The  energy  of  HPU  turbine  fragments  and  the  energy  of  resulting 
uncontained  fragments  at  various  speeds,  are  presented  in  Table 
10.5-2. 

The  likelihood  of  an  uncontained  fragment  striking  a piece  of 
equipment  must  consider  both  the  fragment  spray  pattern  and  the 
location  of  the  equipment  in  the  SRB  aft  skirt  area.  As  in  the 
APU,  the  fragment  spray  pattern  that  would  result  from  an  uncon- 
tained HPU  hub  fragmentation  is  difficult  to  define  because  of 
the  lack  of  data  and  the  complex  HPU  containment  ring  geometry. 

The  HPU  fragment  spray  pattern  was  assumed  to  be  the  same  as  that 
for  the  APU  discussed  in  Section  7.1.5. 
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Table  10.5-2 

HPU  Uncontained  Fragment  Energies 


Fragment 

APU  Uncont. 

% oper. 

V 

Energy 

Frag  Energy 

Speed 

(RPM) 

(lb-ft) 

< lb-ft) 

100 

72,000 

10,688 

0 

110 

79,200 

12,932 

0 

120 

86,400 

15,390 

0 

130 

93,600 

18,063 

0 

140 

100,800 

20,948 

0 

150 

108,000 

24,048 

0 

160 

115,200 

27,361 

3,277 

170 

122,400 

30,888 

6,804 

180 

129,600 

34,629 

10,545 

190 

136,800 

38,584 

14,500 

200 

144,000 

42,752 

18,668 

Much  less  information  was  available  to  support  the  assessment  of 
the  likelihood  of  an  uncontained  HPU  fragment  striking  flight 
critical  equipment  and  the  vulnerability  of  the  equipment.  A 
number  of  items  of  equipment  are  potentially  subject  to  being 
struck.  However,  most  items  are  components  of  the  HFU/Hydraulic 
system  containing  the  failed  HPU  and,  hence,  would  contribute 
little  additional  risk.  The  exceptions  to  this  are  the  hydraulic 
lines.  An  HPU  turbine  breakup  has  a finite  likelihood  of  cutting 
a hydraulic  line  from  the  second  HPU.  This  possibility  is  judged 
to  be  the  predominant  source  of  risk  from  an  HPU  turbine  failure. 
The  following  probability  distribution  was  assigned. 
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f 5 | .05  | .1  | .15  | .2  | .25  | .3  | 

This  distribution  was  used  in  the  evaluation  of  event  tree  top 
event  CH  after  the  occurrence  of  TH. 


10.5.1.6  Probability  of  a Hydrazine  Leak  as  a Consequence  of 
Uncontained  Shrapnel  from  Another  HPU 


The  occurrence  of  a hydrazine  leak  as  a consequence  of  uncon- 
tained shrapnel  from  another  HPU  is  considered  an  unlikely  event. 
Because  of  the  locations  and  orientations  of  the  HPU,  it  is  judged 
that  this  could  only  occur  as  a result  of  a fragment  ricochet  or 
secondary  shrapnel.  This  is  expressed  by  assigning 
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This  value  was  used  in  the  evaluation  of  event  tree  top  event  FH 
after  the  occurrence  of  TH  or  PH  without  the  occurrence  of  CH. 


10.5.2  SIE  Data  Related  to  HPU  Fuel  Leakage 

The  following  paragraphs  present  the  probability  of  frequency 
distributions  developed  to  represent  the  conditional  prob- 
abilities related  to  HPU  fuel  leakage,  and  discuss  the  data  that 
supports  these  distributions.  Only  those  split  fractions  which 
proved  significant  to  the  model  are  discussed. 

As  indicated  in  Section  9.6.2  above,  leaking  HPU  fuel  (hydrazine) 
can  damage  equipment  by  means  of  corrosion,  fire,  or  detonation. 
Due  to  the  lack  of  oxygen  in  the  SRB  aft  skirt  area  during 
prelaunch  and  ascent,  combustion  cannot  occur.  Like  the  APU, 
electrical  wiring  for  the  HPU  has  insulation  consisting  of  an 
inner  layer  of  Teflon  and  an  outer  layer  of  Kapton.  Although 
given  sufficient  time  liquid  hydrazine  can  dissolve  Kapton,  it 
will  not  dissolve  Teflon.  In  addition,  the  time  available  for 
hydrazine  to  affect  wiring  in  the  aft  skirt  area  is  very  limited 
before  SRB  SEP.  Thus,  corrosion  is  not  considered  a credible 
mechanism  by  which  hydrazine  may  damage  the  HPUs. 


10.5.2.1  Probability  of  HPU  Failure  Given  a Small  Fuel  Leak  in 
That  HPU 

HPU  failure  given  a small  fuel  leak  in  that  HPU  is  a potential 
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problem  shared  by  the  APU.  Development  of  the  appropriate  split 
fraction  for  the  APU  is  discussed  in  Section  7. 6. 3.1.  The  ruling 
out  of  possible  hydrazine  corrosion  damage  to  the  HPU  is  the  most 
significant  difference  between  the  two  cases. 


After  consideration  of  the  expert  opinion,  which  was  surveyed 
at  the  1 October  1987  meeting,  the  probability  of  frequency 
distribution  adopted  for  the  split  fraction  associated  with  HPU 
failure  given  a small  fuel  leak  in  that  HPU  has  the  following 
characteristics : 


Mean  Frequency 

5th  Percentile  Frequency 

Median  Frequency 

95th  Percentile  Frequency 


1.6140  x 10“2 
1.9024  X 10"3 
9.7847  X 10’3 
4.8681  X 10“2 


This  distribution  was  used  in  the  evaluation  of  event  tree  top 
events  BA  or  BB  after  occurrence  of  KA  or  RB  respectively. 


10.5.2.2  HPU  Failure  Given  a Small  Fuel  Leak  in  Another  HPU 

The  probability  of  HPU  failure  given  a small  fuel  leak  in  another 
HPU  is  less  than  the  probability  of  HPU  failure  given  a small 
fuel  leak  in  the  same  HPU.  Internal  fuel  leakage  in  another  HPU 
poses  a lesser  risk  since  the  resulting  detonation  will  produce, 
at  most,  low  energy  shrapnel.  Risk  resulting  from  thermal  damage 
to  the  HPU  by  catalytically-induced  hydrazine  decomposition  is 
less  because  of  the  distance  between  HPUs. 


After  consideration  of  the  expert  opinion,  the  probability  of 
frequency  distribution  adopted  for  the  split  fraction  associated 
with  HPU  failure  given  a small  fuel  leak  in  another  HPU  has  the 
following  characteristics: 


Mean  Frequency 

5th  Percentile  Frequency 

Median  Frequency 

95th  Percentile  Frequency 


5.7036  x 10~3 
9.5734  x 10"4 
3.9254  X 10~3 
1.5619  X 10“2 


This  distribution  was  used  in  the  evaluation  of  event  tree  top 
events  BA  or  BB  after  occurrence  of  KB  or  KA  respectively. 
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11.0  quantitative  results  of  the  hpu  pra 


The  Probabilistic  Risk  Analysis  (PRA)  model  was  constructed 
from  the  top  down.  It  began  with  illustrating  the  major 
functions  of  the  Shuttle,  interruption  of  which  would  cause 
loss  of  crew  or  vehicle,  in  the  Master  Logic  Diagram  (KLD) . 

That  diagram  was  developed  to  the  level  of  initial  failure 
categories  of  the  Hydraulic  Power  Unit  (HPU)  that  could  lead 
to  the  damage  states  Loss  of  Crew/Vehicle  (LOC/V)  after  launch 
or  launch  scrub  before  launch.  Event  sequence  diagrams  were 
used  to  define  and  described  all  significant  scenarios  that 
could  lead  from  an  initial  failure  to  one  of  the  damage  states. 
The  event  trees  and  split  fraction  models  provided  further 
detail  of  the  scenarios  in  a form  that  is  also  quantifiable. 

The  level  of  detail  was  commensurate  with  the  data  that  was 
collected  from  various  sources  throughout  the  National 
Aeronautics  and  Space  Administration  (NASA)  and  was  generally 
at  a component  or  sub-component  level. 

Quantification  is  performed  from  the  bottom  up.  Probability 
distributions  that  reflect  actuarial  information  about  the 
HPU,  analysis,  maintenance  procedures  and  engineering  judgment 
were  developed  for  each  component,  sub-component,  and  event 
in  the  model.  The  minimal  cut  sets  of  the  split  fraction 
models  were  obtained  and  the  appropriate  probability  distribution 
assigned  to  each  basic  event  in  the  cut  sets.  The  RISKMAN  soft- 
ware facilitates  the  development  of  algebraic  equations  that 
represent  each  split  fraction  and  using  the  assigned  probability 
distributions,  obtained  the  numerical  value  of  each  split 
fraction  in  the  HPU  event  tree.  Another  module  of  RISKMAN 
combined  the  split  fractions  to  obtain  the  frequency  of  each 
scenario.  Since  each  scenario  was  associated  with  a damage 
state  (or  the  OK  state) , scenarios  frequencies  are  summed,  as 
shown  in  Section  5.10,  to  obtain  the  total  damage  state 
frequency. 

The  results  of  this  study  are  presented  in  terms  of  the 
following: 


a.  Risk  profiles  of  each  damage  state  and  the  interpretation 
of  the  profiles 

b.  Description  of  scenarios  in  order  of  their  importance  to 
the  risk  profiles 
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c. 


Description  of  HPU  component  failure  modes  in  order  of 
their  importance  to  the  risk  profile 


11.1  RISK  PROFILES 

The  probability  distributions  shown  in  Figure  11.1-1  represent 
the  state  of  knowledge  about,  the  fraction  of  missions  in  which 
HPU  failures  on  either  Solid  Rocket  Booster  (SRB)  would  result 
in  loss  of  crew  or  vehicle,  and  the  fraction  of  missions  in 
which  HPU  failures  on  either  SRB  would  result  in  launch  scrub. 
The  former  fraction  includes  the  time  from  launch  to  SRB  SEP . 

The  latter  fraction  includes  the  time  from  L/O  -30  seconds  to 
launch. 

A great  deal  of  information  is  contained  in  these  distributions 
even  without  looking  further  into  what  scenarios  contribute  most 
to  them.  The  results  show  that  it  is  extremely  unlikely  that 
HPUs  would  cause  a loss  of  crew  or  vehicle  more  often  than  once 
in  about  3300  missions.  On  the  other  hand,  it  is  extremely 
unlikely  that  HPUs  would  cause  a loss  of  crew  or  vehicle  less 
often  than  once  in  about  4 million  missions.  The  90%  confidence 
bounds  are  that  the  fraction  of  missions  in  which  HPUs  would 
cause  loss  of  crew  or  vehicle  lies  between  one  in  about  1.1 
million  missions  and  one  in  about  17,600  missions. 

Similarly,  the  results  show  that  it  is  extremely  unlikely  that 
HPUs  would  cause  a launch  scrub  more  often  than  once  in  about  17 
missions.  On  the  other  hand,  it  is  extremely  unlikely  that  HPUs 
would  cause  a launch  scrub  less  often  than  once  in  about  143 
missions.  The  90%  confidence  bounds  are  that  the  fraction  of 
missions  in  which  HPUs  would  cause  a launch  scrub  lies  between 
1 in  about  88  missions  and  1 in  about  28  missions. 

It  is  sometimes  convenient  to  talk  about  probability  distri- 
butions in  terms  of  a measure  of  central  tendency.  The  mean  of 
the  distribution  is  used  as  this  measure.  The  mean  fraction 
of  missions  in  which  HPUs  would  cause  loss  of  crew  or  vehicle 
Was  estimated  to  be  one  in  about  52,000  missions.  The  mean 
fraction  of  missions  in  which  HPUs  would  cause  a launch  scrub 
was  estimated  to  be  one  in  about  44  missions.  It  was  also 
estimated  that  97.6%  of  mission  will  be  accomplished  with  all 
HPUs  operating  throughout. 
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PROBABILITY  DENSITY  PROBABILITY  DENSITY 


HPU-  LAUNCH  SCRUB 


Percentile  Frequency 

5th  1.14E-02 

95th  3.54E-02 


HPU  LOSS  OF  CREW  I VEHICLE 


FREQUENCY,  OCCURENCES  / FUGHT 

Figure  11.1  - HPU  Failure  Probability  Distribution 


PAA  G 021  feenjary  3.  19U  3 29  PM  EMMONS 


11-3 


The  occurrence  of  loss  of  crew  or  vehicle  associated  with  HPUs  is 
quite  unlikely.  This  is  consistent  with*  data  collected  during 
this  study  that  indicated  HPU  components  did  not  fail  during 
flight  nor  during  hot  fire  tests.  The  low  frequency  is  also 
indicative  of  the  prelaunch  countdown  procedure  in  which  HPU 
malfunctions  that  are  detected  before  launch  would  automatically 
scrub  the  launch.  Indeed,  an  HPU  associated  malfunction  was  the 
cause  of  a launch  delay  although  the  cause  was  a circuitry  error 
leading  to  a command  shutdown  rather  than  a malfunction  in  the  HPU 
itself.  (Modeling  this  kind  of  circuitry  malfunction  was  outside 
the  scope  of  this  study.) 

Three  general  factors  lead  to  the  low  frequency  of  HPU  caused  loss 
of  crew  or  vehicle.  The  first  is  the  very  short  duration  of  the 
mission.  The  HPUs  are  required  to  operate  for  a much  shorter  time 
before  being  disassembled,  inspected  and  refurbished  than  the 
Auxiliary  Power  Units  (APUs) . Equipment  with  the  same  failure 
rate  is,  therefore,  far  less  likely  to  fail  during  the  short  HPU 
mission  than  during  a longer  AFU  mission. 

The  second  is  design  specification.  The  HPUs  are  to  be  designed 
with  specifications  similar  to  the  APUs.  The  HPUs  have  a far  less 
taxing  mission  not  only  in  terms  of  duration,  but  in  terms  of  the 
environmental  extremes  that  must  be  endured  during  a mission  and 
still  operate.  It  is  appears  that  the  HPUs  have  a substantial 
design  margin  from  a reliability  standpoint.  The  third  factor 
is  the  extensive  disassembly,  inspection,  refurbishment  and 
testing  that  takes  place  for  each  HPU  component  between  flight. 

We  believe  that  this  process  (described  in  Section  10.4.1)  is 
largely  responsible  for  the  low  incidents  of  failures  during  hot 
fire  tests  before  launches  despite  the  immersion  of  the  HPUs  in 
sea  water  at  the  end  of  each  mission. 


11.2  DESCRIPTION  OF  RISK  SIGNIFICANT  SCENARIOS 


11.2.1  Loss  of  Crew  or  Vehicle 

Over  99%  of  the  risk  of  loss  of  crew  and  vehicle  due  to  HPUs 
is  attributed  to  two  scenarios.  These  are  summarized  in  Table 
11.2-1A.  The  most  risk  significant  scenario  (56.8%  of  the 
frequency  of  loss  of  crew  or  vehicle)  involves  loss  of  two  HPUs 
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on  the  same  SRB  from  equipment  malfunctions  after  launch  and 
before  Solid  Rocket  Booster  Separation  (SRB  SEP)  on  the  same  SRB. 
While  the  split  fraction  models  described  in  Section  9.5  present 
numerous  potential  equipment  failure  combinations,  one  of  these 
combinations  has  been  assessed  as  contributing  over  99%  of  the 
frequency  of  this  scenario.  This  scenario  is  common  cause 
blockage  of  the  lube  oil  flow  path.  Lube  oil  flow  path  blockage 
causes  a rapid  overheat  and  failure  cf  the  bearings  on  the 
rotating  equipment  in  the  gearbox.  The  blockages  may  be  caused 
by  hydrazine  leakage  from  the  fuel  pump  seal  through  the  drain 
cavity  and  into  the  gearbox  via  the  gearbox  shaft  seal.  The 
gearbox  shaft  seal  shares  the  same  seal  drain  cavity. 

Hydrazine  reacts  with  the  lube  oil  to  form  a waxy  substance  that 
collects  on  the  lube  oil  filters  and  eventually  blocks  them.  The 
identified  commonality  of  causes  that  covered  two  HPUs  were  choice 
of  incompatible  materials  (lube  oil  and  hydrazine) , and  design  and 
fabrication  of  the  seals  and  seal  drain  system  that  allowed  the 
two  materials  to  intermingle.  The  recorded  data  from  APUs  for  this 
event  (Table  7.5-3)  indicated  that  three  APUs  had  suffered  flow 
blockages  during  flights,  two  of  which  were  on  the  same  APU 
during  the  same  flight.  This  was  one  of  the  more  significant 
contributors  the  loss  of  crew  and  vehicle  frequency  in  the  APU 
analysis  for  ascent. 

Unlike  the  APUs,  the  recorded  HPU  failure  history  database  of  the 
HPUs  did  not  exhibit  symptoms  (such  as  high  lube  oil  pressure) 
to  indicate  flow  blockages  in  HPUs.  Nevertheless,  because  of  the 
similarity  of  the  HPU  design  to  that  of  the  APU  in  this  area,  the 
possibility  of  this  event  occurring  on  the  HPU  could  not  be  ruled 
out.  However,  the  probability  distribution  for  the  frequency  of 
common  cause  failure  of  two  HPUs  due  to  lube  oil  flow  blockage  was 
appropriately  reduced  to  reflect  the  lack  of  incidents  and  the 
shorter  mission  time.  Although  the  percentage  contribution  is 
bigh,  the  frequency  of  the  event  has  been  assessed  as  being  very 
small  for  the  HPU  (once  in  about  99,000  missions). 

The  other  risk  significant  scenario  accounts  for  43%  of  the 
frequency  of  loss  of  crew  or  vehicle.  It  involves  failure  of 
an  HPU  turbine  such  that  the  turbine  breaks  into  high  energy 
fragments  while  it  is  operating  at  normal  speed.  Breakup  can 
occur  either  from  a flaw  which  could  contribute  to  accelerated 
crack  propagation,  from  fatigue,  or  from  other  causes. 

Inspection  of  HPU  turbines  after  each  flight  have  consistently 
shown  cracks  in  turbine  blades. 


TABLE  11-2-1A 

IMPORTANCE  RANKING  OP  HPU  FAILURE  SCENARIOS 


LOC/V 


RANK  FAILURE  SCENARIO  RISE  CONTRIBUTORS 


1 Equipment  failure  of  2 HPUs  on  the  same  SRB 

between  launch  and  SRB  SEP 

Contributors  and  % Contribution  to  Scenario  l: 

a.  Common  cause  restriction  lube  oil  flow 
causing  bearing  overheat  and  failure  of 
rotating  equipment  in  the  gearbox  (99%) 


2 Turbine  failure  leading  to  shrapnel  induced 

failure  of  a second  HPU  or  other  flight 
critical  equipment  between  launch  and  SRB  SEP 

Contributors  and  % Contribution  to  Scenario  2: 

a.  Turbine  fragmentation  at  normal  speed  (100%) 


3 All  Others 


TOTAL 


% CONT- 
RIBUTION 


56.8 


43.0 


0.2 


100.0 
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Turbine  breakup,  of  course,  guarantees  the  failure  of  at  least 
one  HPU.  The  turbine  may  fail  in  a way  that  causes  it  to  wobble 
on  its  axis  of  rotation  such  that  when  it  comes  part,  the  pieces 
are  not  thrown  precisely  radially  outward  on  the  normal  plane  of 
rotation  and  therefore,  miss  the  containment  ring.  Tests  have 
demonstrated  that  the  portion  of  the  turbine  casing  that  is  not 
reinforced  with  the  containment  ring  does  not  retain  the  fragments. 
These  fragments  become  high  energy  projectiles  capable  of  damaging 
other  equipment. 

The  potential  path  and  range  of  energy  of  the  shrapnel  was 
analyzed  along  with  the  strength  of  the  materials  that  could  be 
in  its  path.  There  is  a chance  that  the  shrapnel  will  pierce 
the  hydrazine  tank  of  the  same  HPU  that  suffered  the  turbine 
failure.  The  subsequent  release  of  large  amounts  of  hydrazine 
could  damage  the  insulation  of  wiring  associated  with  the  other 
HPU,  thereby,  failing  the  second  system.  Wiring  Insulation 
material  of  the  HPU  is  made  of  teflon  which  is  resistant  to  the 
corrosive  property  of  hydrazine.  A very  low  distribution  was 
assigned  for  the  frequency  of  failing  the  second  HPU  or  some  other 
flight  critical  equipment  in  the  aft  skirt.  The  distribution 
estimates  that  about  1 in  100  turbine  failures  would  result  in 
shrapnel-induced  damage  leading  to  loss  of  crew  or  vehicle.  The 
overall  frequency  of  this  scenario  is  about  one  in  128,000 
missions. 


% 

11.2.2  Launch  Scrub 

Table  11.2-1B  shows  that  over  99%  of  the  frequency  of  launch 
scrub  is  attributed  to  two  failure  scenarios.  The  most  important 
scenario  involves  98.4%  of  the  launch  scrub  frequency.  This 
scenario  represents  those  HPU  failures  that  occur  upon  attempting 
to  start  the  HPUs  at  L/O  —30  seconds. 

The  other  scenario  comprises  1.5%  of  the  launch  scrub  frequency. 
It  involves  run  failures  of  equipment  in  a single  HPU  during  the 
30  seconds  before  launch.  These  are  failures  that  would  cause 
the  HPU  to  cease  operating.  Violations  of  launch  commit  criteria 
that  allow  the  HPUs  to  continue  operating  were  not  included  in 
the  scope  of  this  study. 
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RANK 


1 


TABLE  11.2-1B 

IMPORTANCE  RANKING  OP  HPU  FAILURE  SCENARIOS 


LAUNCH  SCRUB 


FAILURE  SCENARIO  RISK  CONTRIBUTORS 


Failure  to  start  an  HPU  at  Lift-off  -30 

seconds 

Contributors  and  % Contribution  to  Scenario  1: 

a.  Secondary  control  valve  leaks  before 
isolation  valve  is  opened  (11%) 

b.  Fuel  tank  isolation  valve  fails  to  open 
at  start  (mechanical  failure)  (11%) 

c.  Primary  control  valve  fails  to  close 
at  start  (mechanical  failure)  (11%) 

d.  Secondary  control  valve  fails  to  open 
at  start  (mechanical  failure)  (11%) 

e:  Failure  of  electric  power  to  isolation 

valve  (10%) 

f.  Failure  of  electric  power  to  secondary 
valve  (10%) 

g.  MPU  1 fails  high  at  start  (9%) 

h.  MPU  2 fails  high  at  start  (9%) 

i.  Fuel  pump  bypass  valve  fails  to  open 
(6%) 

j . Fuel  pump  bypass  valve  fails  to  close 
(6%) 

k.  Primary  valve  controller  fails  off  (2%) 

l.  Secondary  valve  controller  fails  off 
(2%) 


% CONT- 
RIBUTION 


98.4 
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SANK 


2 


TABLE  11. 2 -IB  (Concluded) 
IMPORTANCE  RANKINS  OP  HPU  FAILURE  SCENARIOS 

LAUNCH  SCRUB 

FAILURE  SCENARIO  RISK  CONTRIBUTORS 

Failure  of  the  HPU  to  continue  operating 
after  start  and  before  launch 

Contributors  and  % Contribution  to  Scenario  2: 

a.  Primary  control  valve  transfers  closed 
and  stays  closed  while  pulsing  (27%) 

b.  Lube  oil  flow  path  blocked  (27%) 

c.  MPU  1 output  fails  high  (13%) 

d.  MPU  2 output  fails  high  (13%) 

e.  Turbine  wheel  fragments  while  running 
at  normal  speed  (8%) 

f.  Fuel  pump  filter  blocked  (3%) 

g.  Gas  generator  fails  (2%) 

All  Others 

TOTAL 


% CONT- 
RIBUTION 


1.5 


0.1 


100.0 
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11.3  FAILURE  MODE  IMPORTANCE  RANKING 


Another  way  to  dissect  the  results  is  to  perform  sensitivity 
studies  on  the  importance  of  individual  failure  modes  to  the 
overall  frequency  of  each  damage  state.  This  was  done  by 
numerous  requantifications  of  the  HPU  risk  model.  For  each 
requantification  a different  failure  mode  was  assigned  a failure 
frequency  of  zero.  In  other  words , the  component  was  assumed  to 
be  perfect  with  respect  to  that  failure  mode.  In  general,  the 
requantification  yields  an  estimate  of  the  damage  state  frequency 
that  is  lover  than  the  base  case.  The  following  importance  para- 
meter was,  therefore,  used  to  rank  the  individual  failure  modes: 

BASELINE  QUANTIFICATION  - REQUANTIFICATION 


J BASELINE  QUANTIFICATION 

The  results  shown  in  Table  11.2-2  are  normalized  by  a factor 
representing  the  summation  of  all  la . The  failure  modes  shown 
in  the  Table  represent  over  99%  of  their  respective  damage 
state  frequencies. 


11.4  INTERPRETATION  OF  RESULTS 

Loss  of  crew  or  vehicle  associated  with  HPU- initiated  scenarios 
has  been  assessed  as  highly  unlikely  relative  to  the  risk  to  the 
vehicle  from  APU-initiated  scenarios.  This  is  primarily  because 
of  the  much  shorter  HPU  mission  duration.  It  appears  that  the 
extensive  refurbishment  and  pre-flight  checkout  procedure  of  the 
HPUs  effectively  compensates  for  their  immersion  in  sea  water  at 
the  end  of  each  flight. 

Only  two  HPU  failure  modes  contribute  about  98%  of  the  frequency 
of  loss  of  crew  or  vehicle.  These  are  restricted  lube  oil 
circulation  and  turbine  wheel  failure. 

The  results  indicate  that  the  APU  should  receive  much  higher 
management  attention  for  resource  allocation  to  reduce  the  risk 
to  the  vehicle  than  should  the  HPU. 

For  those  resources  that  are,  nevertheless,  allocated  to  the  HPU, 
the  above  two  items  should  receive  a far  higher  priority  than  all 
other  failures.  Although  other  failures  can  also  lead  to  loss  of 
crew  and  vehicle,  they  have  been  estimated  to  be  of  such  low 
frequency  that  fixing  them  would  provide  negligible  reduction  of 
risk. 
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TABLE  11.2-2 

IMPORTANCE  RANKING  OF  KFU 
FAILURE  MODES 


LOBS  OF  CREW  OR  VESICLE 


RANKING 


COMPONENT/ASSEMBLY  % CONT- 

RISX  CONTRIBUTORS  RIBUTION 


1 

Lube  oil  circulation  restricted 

55.0 

2 

Turbine  wheel  failure 

43.0 

3 

Primary  control  valve  transfers 
closed  while  pulsing 

1.0 

4 

All  other  failures 

1.0 

TOTAL 


100.0 
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TABLE  11.2-2  (Concluded) 

IMPORTANCE  RANKING  OP  EPU 
FAILURE  MODES 

LAUNCH  SCRUB 


RANKING 

COMPONENT/ASSEMBLY 
RISK  CONTRIBUTORS 

% CONT- 
RIBUTION 

1 

Secondary  control  valve  leaks  before 
isolation  valves  open 

12.0 

2 

Fuel  tank  isolation  valve  fails  to 
open  on  demand 

12.0 

3 

Primary  control  valve  fails  to  close 
when  HPU  started  (mechanical  failure) 

12.0 

4 

Secondary  control  valve  fails  to  open 
when  HPU  started  (mechanical  failure) 

12.0 

5 

Loss  of  electric  power  to  isolation 
valves 

10.5 

C 

Loss  of  electric  power  to  secondary 
control  valve 

10.5 

7 

MFU  1 fails  high  on  start 

9.0 

8 

MFU  2 fails  high  on  start 

9.0 

9 

Fuel  pump  bypass  valve  fails  to  open 
on  start 

6.0 

10 

Fuel  pump  bypass  valve  fails  to  close 
on  demand  when  pump  is  operating 

6.0 

11 

All  Other  Failures 

1.0 

TOTAL 

100.0 
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To  reduce  the  likelihood  of  risk  associated  with  the  HPUs,  ve 
would  recommend  the  following  actions: 

a.  Change  the  design  of  the  seal  leakage  cavity  such  that  the 
flow  path  from  the  fuel  pump  seal  to  the  gearbox  shaft 
seal  is  eliminated. 

b.  Continue  thorough  flushing  and  cleanup  of  the  lube  oil 
lines  and  filter. 

c.  Investigate  and  determine  the  cause  of  turbine  wheel  blade 
cracking.  Change  design  or  operation  to  eliminate  the 
cause. 

The  study  results  indicate  that  resources  spent  on  other  failure 
modes  to  be  of  far  less  benefit. 
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overspeed  is  caused  by  a secondary  GG  valve  stuck  in 
mid-position. ) 

68.  Response  to  Action  Item  37,  NRA-046,  September  9,  1987. 
(Loss  of  control  due  to  loss  of  two  APUs  during  ascent.) 

69.  Response  to  Action  Item  15,  NRA-048,  September  8,  1987. 
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72.  Response  to  Action  Item  36,  NRA-051,  September  12,  1987. 
(The  Lube  Oil  ignition  temperature  in  the  gearbox 

and  leaking  lube  oil  ignition  in  the  aft  compartment.) 

73.  Response  to  Action  Item  5,  NRA-052,  September  9,  1987. 
(The  possibility  of  main  engine  (propellant)  detonation 
due  to  an  APU  fuel  fire. ) 

74.  Estimated  failure  rates  for  Hybrid  Drivers  & Remote 
Power  Controllers.  NRA-053,  September  11,  1987. 

75.  Response  to  Action  Item  7,  NRA-054,  September  9,  1987. 
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APU  exhaust  leak.) 
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77.  Response  to  Action  Item  12,  NRA-057 , September  12,  1987. 
(Gradual  GG  bed  contamination  as  an  impending  failure 
node) . 

78.  Response  to  Action  Item  2,  NRA-059,  September  16,  1987. 
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13.0  PROOP  OP  CONCEPT  STUDY  ACRONYMS  LI8T 


AFB 

A1 

AOA 

APU 

ARCS 

ASSY 

ATCS 

ATO 

ATP 

ATT 

ATVC 

AV 

BITE 

C&W 

CAL 

CAR 

CB 

CIL 

CKO 

CKT 

CL 

els 

CMD 

CNTL 

CNTLR 

CO 

C02 

C/O 

CR 

CRES 

CRIT 

CRT 

D&C 

D 

D/O 

delta  P 

DFI 

displ 

DIST 

DMD/HR 

DOD 

DSC 

EGT 

El 

Elec 

ENA 


- Air  Force  Base 

- Aluminum 

- Abort-Once-Around 

- Auxiliary  Power  Unit 

- Aft  Reaction  Control  System  (Subsystem) 

• Assembly 

- Active  Thermal  Control  Subsystem 

- Abort-To-Orbit 

• Acceptance  Test  Procedure 

- Attitude 

- Ascent  Thrust  Vector  Control 

- Avionics 

- Built-In  Test  Equipment 

- Caution  and  Warning 

- Calibration 

- Corrective  Action  Reports 

- Circuit  Breaker 

- Critical  Items  List 

- Checkout 

- Circuit 

- Close  (Closed) 

- Closes 

- Command , Commander 

- Control 

- Controller 

- Carbon  Monoxide 

- Carbon  Dioxide 

- Checkout 

- Confidence  Run 

• Corrosion  Resistant  steel 

- Criticality 

- Cathode-Ray  Tube 

- Displays  and  Controls 

- Demand 

- Deorbit 

- Differential  Pressure 

- Development  Flight  Instrumentation 

- Display 

- Distribution 

- Demand  per  hour 

- Department  of  Defense 

- Dedicated  Signal  Conditioner 

- Exhaust  Gas  Temperature 

- Entry  Interface  (400,000  ft.  During  Entry) 

- Electrical 

- Enable 
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ACRONYMS  (Continued) 


EPDC 

EPS 

ESD 

ET 

Exh 

Exhst 

F 

FA 

FCE 

FCS 

FDLINE 

FDA 

FDF 

FF 

FIV 

FLT 

FM 

FMEA 

FPL 

frag. 

FRCS 

FRF 

FPR 

FSM 

FSSR 

FSW 

ft 

FU 

FWD 

G 

GB 

Gen 

GFE 

GG 

GGVM 

GN2 

GNC 

GND 

GO  2 

GPC 

GPM 

GSE 

H 

H2 

H20 

HA 


- Electrical  Power  Distribution  and  Control 

- Electrical  Power  System 

- Event  Sequence  Diagram 

- External  Tank,  Event  Tree 

- Exhaust 

- Exhaust 

- Fahrenheit 

- Flight  Aft 

- Flight  Critical  Equipment 

- Flight  Control  System 

- Feed  Line 

- Fault  Detection  and  Annunciation 

- Flight  Data  File 

- Flight  Forward 

- Fuel  Isolation  Valve 

- Flight 

- Failure  Mode 

- Failure  Modes  and  Effects  Analysis 

- Full  Power  Level  (Main  Engine  t 109%  Rated  Thrust) 

- Fragment 

- Forward  Reaction  Control  System  (Subsystem) 

- Flight  Readiness  Firings 

- Full  Problem  Record 

- Fuel  Supply  Module 

- Flight  Systems  Software  Requirements 

- Flight  Software 

- Feet 

- Fuel 

- Forward 

- Gravity 

- Gearbox 

- Generator 

- Government  Furnished  Equipment 

- Gas  Generator 

- Gas  Generator  Valve  Module 

- Gaseous  Nitrogen 

- Guidance,  Navigation,  and  Control 

- Ground 

- Gaseous  Oxygen 

- General  Purpose  Computer 

- Gallons  per  Minute 

- Ground  Support  Equipment 

- Hours 

- Hydrogen 

- water 

- Hazard  Analysis 
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ACRONYMS  (Continued) 


HDC 

- 

Hybrid  Driver  Controller 

He 

- 

Helium 

HEX 

- 

Heat  Exchanger 

Hg 

- 

Mercury 

HPU 

- 

Hydraulic  Power  Unit 

HW 

- 

Hardware 

HYD 

• 

Hydraulics 

IA 

- 

Intact  Abort 

IEA 

- 

Integrated  Electronics  Assembly 

ID 

- 

Identifier 

ID 

- 

Inside  Diameter 

IFM 

- 

In-Flight  Maintenance 

INS 

- 

Insertion 

IOA 

- 

Independent  Orb iter  Assessment 

ISO 

- 

Isolation 

ISOL 

- 

Isolation 

JSC 

- 

Johnson  Space  Center 

Kft 

- 

1000  Feet 

KSC 

Kennedy  Space  Center 

L 

- 

Left 

LA 

- 

Launch  Abort 

lb 

- 

pound 

L/O 

- 

Lift  Off 

L/OFF 

- 

Lift  Off 

LF 

- 

Launch  Forward 

LH 

- 

Left  Hand 

LH2 

- 

Liquid  Hydrogen 

LL 

- 

Launch  Left 

L02 

- 

Liquid  Oxygen 

LOCV 

— 

Loss  Of  Crew/Vehicle 

LOM 

- 

Loss  of  Mission 

LOX 

- 

Liquid  Oxygen 

LPS 

• 

Launch  Processing  System 

LR 

- 

Launch  Right 

LRU 

- 

Line  Replaceable  Unit 

LS 

- 

Launch  Scrub 

LT 

- 

Light 

LUBE 

- 

Lubrication,  Lubricating 

LV 

Loss  Of  Crew  or  Vehicle 

MAN 

- 

Manual 

MANF 

- 

Manifold 

MCC 

- 

Mission  Control  Center  (JSC) 

MDAC 

- 

McDonnell  Douglas  Astronautics  Company 

MDAC-ES 

McDonnell  Douglas  Astronautics  Company- 
Engineering  Services 

MDF 

— 

Minimum  Duration  Flight 
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acronyms  (Continued) 


MDM 

HE 

MECO 

MET 

MLD 

MLG 

MM 

MMH 

MEC 

MN 

MON 

MPL 

MPS 

MPU 

ms 

MSEC 

MTR 

MUM 

N2 

N2H4 

N204 

N/A 

NA 

NASA 

NC 

NGTD 

NH3 

NLG 

NO 

NPRD 

NRA 

NSTS 

NW 

NWS 

02 

OFT 

01 

OMI 

OMS 

OP 

Oper 

OPS 

OXID 

P/L 

PC 

Pc 


- Multiplexer/Demultiplexer 

- Main  Engine 

- Main  Engine  Cutoff 

- Mission  Elapsed  Time 

- Master  Logic  Diagram 

- Main  Landing  Gear 

- Major  Mode 

- Monomethyl  Hydrazine 

- Master  Events  Controller 

- Main 

- Minimum  Power  Level  (Main  Engine  § 65%  Rated  Thrust) 

- Main  Propulsion  System  (Subsystem) 

- Magnetic  Pickup  Unit 

- Millisecond 

- Marshall  Space  Flight  Center 

- Motor 

- Multiplexer 

- Nitrogen 

- Hydrazine 

- Nitrogen  Tetroxide 

- Not  Applicable 

- Not  Applicable 

- National  Aeronautics  and  Space  Administration 

- Normally  Closed 

- Nose  Gear  Touch  Down 

- Ammonia 

- Nose  Landing  Gear 

- Normally  Open,  Number 

- Nonelectronic  Parts  Reliability  Data 

- Numerical  Risk  Assessment 

- National  Space  Transportation  System 

- Nose  Wheel 

- Nose-Wheel  Steering 

- oxygen 

- orbital  Flight  Test 

- operational  Instrumentation 

- Operational  Maintenance  Instructions 

- Orbital  Maneuvering  System 

- open 

- Operation 

- Operations  Sequence 

- Oxidizer 

- Payload 

- Personal  Computer,  Printed  Circuit 

- chamber  Pressure 
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ACRONYMS  (Continued) 


PF 

PI 

PL 

PLB 

PLG 

PLS 

PLT 

PM 

PNL 

POS 

PRA 

PRCS 

PREP 

PRESS 

Prim 

psi 

psia 

psid 

psig 

PWR 

QC 

QD 

QRA 

QUAL 

R 

RCS 

Ref. 

REV 

RF 

RH 

RPC 

RPL 

RPM 

Rt 

RTLS 

S/N 

scfm 

SD 

S/D 

SEC 

SEP 

SFOM 

SFP 

SIE 

SPEC 

SR 


- Payload  Forward,  Permanent  Failure 
“ Principal  Investigator  * 

“ Primary  Landing  Site  Entry 
~ Payload  Bay 

- Pickard,  Love  and  Garrick,  Inc. 

- Primary  Landing  Site 

- Pilot 

- Project  Manager 

- Panel 

~ Position 

- Probabilistic  Risk  Assessment 

- Primary  Reaction  Control  System  (jet) 

- Preparation 

- Pressure 

- Primary 

- Pounds  per  Square  Inch 

- Pounds  per  Square  Inch  Absolute 

- Pounds  per  Square  Inch  Differential 

- Pounds  per  Square  Inch  Gage 

- Power 

- Quality  Control 

- Quick  Disconnect 

“ Quantitative  Risk  Assessment 

- Qualfication  Test 

- Right,  Roll 

~ Reaction  Control  System 

- Reference 

- Revision 

- Recoverable  Failure 

- Right  Hand 

“ Remote  Power  Controller 

- Rated  Power  Level  (Main  Engine  § 100%  Rated  Thrust) 

- Revolutions  Per  Minute,  Rotations  Per  Minute 

- Right 

- Return  to  Launch  site 

- Serial  Number 

~ Standard  Cubic  Feet  per  Minute 

- Shutdown 

- Shutdown 

- Secondary 

- Separation 

~ Shuttle  Flight  Operations  Manual 

- Single  Failure  Point 

- Spatial  Interaction  Event 

- Specification 

- Stop  Roll 
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ACRONYMS  (Concluded) 


SR&QA 

SRB 

SRM 

SSM 

SSME 

SSSH 

STS 

SW 

SYS 

T-0 

TAEM 

TAL 

TD 

Ti 

TIG 

TK 

TPS 

trans . 

TVC 

uncont. 

U.S. 

USBI 

VAX 

VDC 

VERN 

VLV 

VRCS 

WONG 

WOW 

WS 

WSB 

WUC 

XDCR 

Xo 

XFR 

Y 


- Safety,  Reliability  and  Quality  Assurance 

- Solid  Rocket  Booster 

- Solid  Rocket  Motor 

- Subsystem  Manager 

- Space  Shuttle  Main  Engine 

- Space  Shuttle  Systems  Handbook 

- Space  Transportation  System 

- Switch 

- System 

- Time  Zero  (Also  Commonly  Used  for  L/Off) 

- Termianl  Area  Energy  Management 

- Transatlantic  Abort  Landing 

- Touch  Down  (Vehicle) 

- Titanium 

- Time  Of  Ignition 

- Tank 

- Thermal  Protection  System 

- Transducer 

- Thrust  Vector  Control 

- uncontained 

- United  States 

- United  Space  Boosters  Inc. 

- A computer  manufactured  by  Digital 
Equipment  Corporation 

- Volts,  dc 

- Vernier 

- Valve 

- Vernier  Reaction  Control  System  (jet) 

- Weight  on  Nose  Gear 

- Weight  on  Wheels  (Main  Landing  Gear) 

- Wheel  Stop 

- Water  Spray  Boiler 

- Work  Unit  Code 

- Transducer 

- X-Axis  of  Orbiter 

- Transfer 

- Yaw 


